{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Text Processing\n",
    "\n",
    "## Capturing Text Data\n",
    "\n",
    "### Plain Text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "In the beginning God created the heaven and the earth. \n",
      "And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters. \n",
      "And God said, Let there be light: and there was light. \n",
      "And God saw the light, that it was good: and God divided the light from the darkness. \n",
      "And God called the light Day, and the darkness he called Night. And the evening and the morning were the first day. \n",
      "And God said, Let there be a firmament in the midst of the waters, and let it divide the waters from the waters. \n",
      "And God made the firmament, and divided the waters which were under the firmament from the waters which were above the firmament: and it was so. \n",
      "And God called the firmament Heaven. And the evening and the morning were the second day. \n",
      "And God said, Let the waters under the heaven be gathered together unto one place, and let the dry land appear: and it was so. \n",
      "And God called the dry land Earth; and the gathering together of the waters called he Seas: and God saw that it was good. \n",
      "And God said, Let the earth bring forth grass, the herb yielding seed, and the fruit tree yielding fruit after his kind, whose seed is in itself, upon the earth: and it was so. \n",
      "And the earth brought forth grass, and herb yielding seed after his kind, and the tree yielding fruit, whose seed was in itself, after his kind: and God saw that it was good. \n",
      "And the evening and the morning were the third day. \n",
      "And God said, Let there be lights in the firmament of the heaven to divide the day from the night; and let them be for signs, and for seasons, and for days, and years: \n",
      "And let them be for lights in the firmament of the heaven to give light upon the earth: and it was so. \n",
      "And God made two great lights; the greater light to rule the day, and the lesser light to rule the night: he made the stars also. \n",
      "And God set them in the firmament of the heaven to give light upon the earth, \n",
      "And to rule over the day and over the night, and to divide the light from the darkness: and God saw that it was good. \n",
      "And the evening and the morning were the fourth day. \n",
      "And God said, Let the waters bring forth abundantly the moving creature that hath life, and fowl that may fly above the earth in the open firmament of heaven. \n",
      "And God created great whales, and every living creature that moveth, which the waters brought forth abundantly, after their kind, and every winged fowl after his kind: and God saw that it was good. \n",
      "And God blessed them, saying, Be fruitful, and multiply, and fill the waters in the seas, and let fowl multiply in the earth. \n",
      "And the evening and the morning were the fifth day. \n",
      "And God said, Let the earth bring forth the living creature after his kind, cattle, and creeping thing, and beast of the earth after his kind: and it was so. \n",
      "And God made the beast of the earth after his kind, and cattle after their kind, and every thing that creepeth upon the earth after his kind: and God saw that it was good. \n",
      "And God said, Let us make man in our image, after our likeness: and let them have dominion over the fish of the sea, and over the fowl of the air, and over the cattle, and over all the earth, and over every creeping thing that creepeth upon the earth. \n",
      "So God created man in his own image, in the image of God created he him; male and female created he them. \n",
      "And God blessed them, and God said unto them, Be fruitful, and multiply, and replenish the earth, and subdue it: and have dominion over the fish of the sea, and over the fowl of the air, and over every living thing that moveth upon the earth. \n",
      "And God said, Behold, I have given you every herb bearing seed, which is upon the face of all the earth, and every tree, in the which is the fruit of a tree yielding seed; to you it shall be for meat. \n",
      "And to every beast of the earth, and to every fowl of the air, and to every thing that creepeth upon the earth, wherein there is life, I have given every green herb for meat: and it was so. \n",
      "And God saw every thing that he had made, and, behold, it was very good. And the evening and the morning were the sixth day. \n",
      "\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "\n",
    "with open(os.path.join('genesis.txt'), 'r') as f:\n",
    "    text = f.read()\n",
    "    print(text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tabular Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Product</th>\n",
       "      <th>Consumer complaint narrative</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mortgage</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Credit reporting</td>\n",
       "      <td>I have outdated information on my credit repor...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Consumer Loan</td>\n",
       "      <td>I purchased a new car on XXXX XXXX. The car de...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Credit card</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Debt collection</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Product                       Consumer complaint narrative\n",
       "0          Mortgage                                                NaN\n",
       "1  Credit reporting  I have outdated information on my credit repor...\n",
       "2     Consumer Loan  I purchased a new car on XXXX XXXX. The car de...\n",
       "3       Credit card                                                NaN\n",
       "4   Debt collection                                                NaN"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv('Consumer_Complaints.csv', encoding = \"ISO-8859-1\")\n",
    "df.head()[['Product', 'Consumer complaint narrative']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Product</th>\n",
       "      <th>Consumer complaint narrative</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Credit reporting</td>\n",
       "      <td>I have outdated information on my credit repor...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Consumer Loan</td>\n",
       "      <td>I purchased a new car on XXXX XXXX. The car de...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Credit reporting</td>\n",
       "      <td>An account on my credit report has a mistaken ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Debt collection</td>\n",
       "      <td>This company refuses to provide me verificatio...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Debt collection</td>\n",
       "      <td>This complaint is in regards to Square Two Fin...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Product                       Consumer complaint narrative\n",
       "1   Credit reporting  I have outdated information on my credit repor...\n",
       "2      Consumer Loan  I purchased a new car on XXXX XXXX. The car de...\n",
       "7   Credit reporting  An account on my credit report has a mistaken ...\n",
       "12   Debt collection  This company refuses to provide me verificatio...\n",
       "16   Debt collection  This complaint is in regards to Square Two Fin..."
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = df[pd.notnull(df['Consumer complaint narrative'])]\n",
    "df.head()[['Product', 'Consumer complaint narrative']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Product</th>\n",
       "      <th>Consumer complaint narrative</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Credit reporting</td>\n",
       "      <td>i have outdated information on my credit repor...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Consumer Loan</td>\n",
       "      <td>i purchased a new car on xxxx xxxx. the car de...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Credit reporting</td>\n",
       "      <td>an account on my credit report has a mistaken ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Debt collection</td>\n",
       "      <td>this company refuses to provide me verificatio...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Debt collection</td>\n",
       "      <td>this complaint is in regards to square two fin...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Product                       Consumer complaint narrative\n",
       "1   Credit reporting  i have outdated information on my credit repor...\n",
       "2      Consumer Loan  i purchased a new car on xxxx xxxx. the car de...\n",
       "7   Credit reporting  an account on my credit report has a mistaken ...\n",
       "12   Debt collection  this company refuses to provide me verificatio...\n",
       "16   Debt collection  this complaint is in regards to square two fin..."
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Consumer complaint narrative'] = df['Consumer complaint narrative'].str.lower()\n",
    "df.head()[['Product', 'Consumer complaint narrative']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Product</th>\n",
       "      <th>Consumer complaint narrative</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Credit reporting</td>\n",
       "      <td>i have outdated information on my credit repor...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Consumer Loan</td>\n",
       "      <td>i purchased a new car on  . the car dealer cal...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Credit reporting</td>\n",
       "      <td>an account on my credit report has a mistaken ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Debt collection</td>\n",
       "      <td>this company refuses to provide me verificatio...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Debt collection</td>\n",
       "      <td>this complaint is in regards to square two fin...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Product                       Consumer complaint narrative\n",
       "1   Credit reporting  i have outdated information on my credit repor...\n",
       "2      Consumer Loan  i purchased a new car on  . the car dealer cal...\n",
       "7   Credit reporting  an account on my credit report has a mistaken ...\n",
       "12   Debt collection  this company refuses to provide me verificatio...\n",
       "16   Debt collection  this complaint is in regards to square two fin..."
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import re\n",
    "\n",
    "df['Consumer complaint narrative'] = df['Consumer complaint narrative'].str.replace('x', '')\n",
    "df.head()[['Product', 'Consumer complaint narrative']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Online Resource"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"success\": {\n",
      "        \"total\": 1\n",
      "    },\n",
      "    \"contents\": {\n",
      "        \"quotes\": [\n",
      "            {\n",
      "                \"quote\": \"The real winners in life are the people who look at every situation with an expectation that they can make it work or make it better\",\n",
      "                \"author\": \"Barbara Pletcher\",\n",
      "                \"length\": \"132\",\n",
      "                \"tags\": [\n",
      "                    \"inspire\",\n",
      "                    \"winning\"\n",
      "                ],\n",
      "                \"category\": \"inspire\",\n",
      "                \"title\": \"Inspiring Quote of the day\",\n",
      "                \"date\": \"2018-05-10\",\n",
      "                \"id\": null\n",
      "            }\n",
      "        ],\n",
      "        \"copyright\": \"2017-19 theysaidso.com\"\n",
      "    }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "import requests\n",
    "import json\n",
    "\n",
    "r = requests.get(\"https://quotes.rest/qod.json\")\n",
    "res = r.json()\n",
    "print(json.dumps(res, indent = 4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'author': 'Barbara Pletcher',\n",
       " 'category': 'inspire',\n",
       " 'date': '2018-05-10',\n",
       " 'id': None,\n",
       " 'length': '132',\n",
       " 'quote': 'The real winners in life are the people who look at every situation with an expectation that they can make it work or make it better',\n",
       " 'tags': ['inspire', 'winning'],\n",
       " 'title': 'Inspiring Quote of the day'}"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "res['contents']['quotes'][0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The real winners in life are the people who look at every situation with an expectation that they can make it work or make it better \n",
      "-- Barbara Pletcher\n"
     ]
    }
   ],
   "source": [
    "q = res['contents']['quotes'][0]\n",
    "print(q['quote'], '\\n--', q['author'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Normalization\n",
    "\n",
    "### Case Normalization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The first time you see The Second Renaissance it may look boring. Look at it at least twice and definitely watch part 2. It will change your view of the matrix. Are the human people the ones who started the war ? Is AI a bad thing ?\n"
     ]
    }
   ],
   "source": [
    "text = \"The first time you see The Second Renaissance it may look boring. Look at it at least twice and definitely watch part 2. It will change your view of the matrix. Are the human people the ones who started the war ? Is AI a bad thing ?\"\n",
    "print(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "the first time you see the second renaissance it may look boring. look at it at least twice and definitely watch part 2. it will change your view of the matrix. are the human people the ones who started the war ? is ai a bad thing ?\n"
     ]
    }
   ],
   "source": [
    "text = text.lower()\n",
    "print(text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Punctuation Removal"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "the first time you see the second renaissance it may look boring  look at it at least twice and definitely watch part 2  it will change your view of the matrix  are the human people the ones who started the war   is ai a bad thing  \n"
     ]
    }
   ],
   "source": [
    "text = re.sub(r'[^a-zA-Z0-9]', ' ', text)\n",
    "print(text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tokenization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['the', 'first', 'time', 'you', 'see', 'the', 'second', 'renaissance', 'it', 'may', 'look', 'boring', 'look', 'at', 'it', 'at', 'least', 'twice', 'and', 'definitely', 'watch', 'part', '2', 'it', 'will', 'change', 'your', 'view', 'of', 'the', 'matrix', 'are', 'the', 'human', 'people', 'the', 'ones', 'who', 'started', 'the', 'war', 'is', 'ai', 'a', 'bad', 'thing']\n"
     ]
    }
   ],
   "source": [
    "words = text.split()\n",
    "print(words)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## NLTK: Natural Language ToolKit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dr. Smith graduated from the University of Washington. He later started an analytics firm called Lux, which catered to enterprise customers.\n"
     ]
    }
   ],
   "source": [
    "# Another sample text\n",
    "text = \"Dr. Smith graduated from the University of Washington. He later started an analytics firm called Lux, which catered to enterprise customers.\"\n",
    "print(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['Dr.', 'Smith', 'graduated', 'from', 'the', 'University', 'of', 'Washington', '.', 'He', 'later', 'started', 'an', 'analytics', 'firm', 'called', 'Lux', ',', 'which', 'catered', 'to', 'enterprise', 'customers', '.']\n"
     ]
    }
   ],
   "source": [
    "from nltk.tokenize import word_tokenize\n",
    "\n",
    "words = word_tokenize(text)\n",
    "print(words)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['Dr. Smith graduated from the University of Washington.', 'He later started an analytics firm called Lux, which catered to enterprise customers.']\n"
     ]
    }
   ],
   "source": [
    "from nltk.tokenize import sent_tokenize\n",
    "\n",
    "# Split text into sentences\n",
    "sentence = sent_tokenize(text)\n",
    "print(sentence)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', \"you're\", \"you've\", \"you'll\", \"you'd\", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', \"she's\", 'her', 'hers', 'herself', 'it', \"it's\", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', \"that'll\", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', \"don't\", 'should', \"should've\", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', \"aren't\", 'couldn', \"couldn't\", 'didn', \"didn't\", 'doesn', \"doesn't\", 'hadn', \"hadn't\", 'hasn', \"hasn't\", 'haven', \"haven't\", 'isn', \"isn't\", 'ma', 'mightn', \"mightn't\", 'mustn', \"mustn't\", 'needn', \"needn't\", 'shan', \"shan't\", 'shouldn', \"shouldn't\", 'wasn', \"wasn't\", 'weren', \"weren't\", 'won', \"won't\", 'wouldn', \"wouldn't\"]\n"
     ]
    }
   ],
   "source": [
    "# List stopwords\n",
    "from nltk.corpus import stopwords\n",
    "print(stopwords.words('english'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['the', 'first', 'time', 'you', 'see', 'the', 'second', 'renaissance', 'it', 'may', 'look', 'boring', 'look', 'at', 'it', 'at', 'least', 'twice', 'and', 'definitely', 'watch', 'part', '2', 'it', 'will', 'change', 'your', 'view', 'of', 'the', 'matrix', 'are', 'the', 'human', 'people', 'the', 'ones', 'who', 'started', 'the', 'war', 'is', 'ai', 'a', 'bad', 'thing']\n"
     ]
    }
   ],
   "source": [
    "# Reset text\n",
    "text = \"The first time you see The Second Renaissance it may look boring. Look at it at least twice and definitely watch part 2. It will change your view of the matrix. Are the human people the ones who started the war ? Is AI a bad thing ?\"\n",
    "\n",
    "# Normalize it\n",
    "text = re.sub(r'[^a-zA-Z0-9]', ' ', text.lower())\n",
    "\n",
    "# Tokenize it\n",
    "words = text.split()\n",
    "print(words)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['first', 'time', 'see', 'second', 'renaissance', 'may', 'look', 'boring', 'look', 'least', 'twice', 'definitely', 'watch', 'part', '2', 'change', 'view', 'matrix', 'human', 'people', 'ones', 'started', 'war', 'ai', 'bad', 'thing']\n"
     ]
    }
   ],
   "source": [
    "# Remove stop words\n",
    "\n",
    "words = [w for w in words if w not in stopwords.words('english')]\n",
    "print(words)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part-of-Speech Tagging"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('I', 'PRP'),\n",
       " ('always', 'RB'),\n",
       " ('lie', 'VBP'),\n",
       " ('down', 'RP'),\n",
       " ('to', 'TO'),\n",
       " ('tell', 'VB'),\n",
       " ('a', 'DT'),\n",
       " ('lie', 'NN'),\n",
       " ('.', '.')]"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from nltk import pos_tag\n",
    "\n",
    "# Tag parts of speech (PoS)\n",
    "sentence = word_tokenize('I always lie down to tell a lie.')\n",
    "pos_tag(sentence)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sentence Parsing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(S\n",
      "  (NP I)\n",
      "  (VP\n",
      "    (VP (V shot) (NP (Det an) (N elephant)))\n",
      "    (PP (P in) (NP (Det my) (N pajamas)))))\n",
      "(S\n",
      "  (NP I)\n",
      "  (VP\n",
      "    (V shot)\n",
      "    (NP (Det an) (N elephant) (PP (P in) (NP (Det my) (N pajamas))))))\n"
     ]
    }
   ],
   "source": [
    "import nltk\n",
    "\n",
    "# Define a custom grammar\n",
    "my_grammar = nltk.CFG.fromstring(\"\"\"\n",
    "S -> NP VP\n",
    "PP -> P NP\n",
    "NP -> Det N | Det N PP | 'I'\n",
    "VP -> V NP | VP PP\n",
    "Det -> 'an' | 'my'\n",
    "N -> 'elephant' | 'pajamas'\n",
    "V -> 'shot'\n",
    "P -> 'in'\n",
    "\"\"\")\n",
    "\n",
    "parser = nltk.ChartParser(my_grammar)\n",
    "\n",
    "# Parse a sentence\n",
    "sentence = word_tokenize(\"I shot an elephant in my pajamas\")\n",
    "for tree in parser.parse(sentence):\n",
    "    print(tree)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAADaCAIAAAAwi45UAAAJMmlDQ1BkZWZhdWx0X3JnYi5pY2MAAEiJlZVnUJNZF8fv8zzphUASQodQQ5EqJYCUEFoo0quoQOidUEVsiLgCK4qINEWQRQEXXJUia0UUC4uCAhZ0gywCyrpxFVFBWXDfGZ33HT+8/5l7z2/+c+bec8/5cAEgiINlwct7YlK6wNvJjhkYFMwE3yiMn5bC8fR0A9/VuxEArcR7ut/P+a4IEZFp/OW4uLxy+SmCdACg7GXWzEpPWeGjy0wPj//CZ1dYsFzgMt9Y4eh/eexLzr8s+pLj681dfhUKABwp+hsO/4b/c++KVDiC9NioyGymT3JUelaYIJKZttIJHpfL9BQkR8UmRH5T8P+V/B2lR2anr0RucsomQWx0TDrzfw41MjA0BF9n8cbrS48hRv9/z2dFX73kegDYcwAg+7564ZUAdO4CQPrRV09tua+UfAA67vAzBJn/eqiVDQ0IgALoQAYoAlWgCXSBETADlsAWOAAX4AF8QRDYAPggBiQCAcgCuWAHKABFYB84CKpALWgATaAVnAad4Dy4Aq6D2+AuGAaPgRBMgpdABN6BBQiCsBAZokEykBKkDulARhAbsoYcIDfIGwqCQqFoKAnKgHKhnVARVApVQXVQE/QLdA66At2EBqGH0Dg0A/0NfYQRmATTYQVYA9aH2TAHdoV94fVwNJwK58D58F64Aq6HT8Id8BX4NjwMC+GX8BwCECLCQJQRXYSNcBEPJBiJQgTIVqQQKUfqkVakG+lD7iFCZBb5gMKgaCgmShdliXJG+aH4qFTUVlQxqgp1AtWB6kXdQ42jRKjPaDJaHq2DtkDz0IHoaHQWugBdjm5Et6OvoYfRk+h3GAyGgWFhzDDOmCBMHGYzphhzGNOGuYwZxExg5rBYrAxWB2uF9cCGYdOxBdhK7EnsJewQdhL7HkfEKeGMcI64YFwSLg9XjmvGXcQN4aZwC3hxvDreAu+Bj8BvwpfgG/Dd+Dv4SfwCQYLAIlgRfAlxhB2ECkIr4RphjPCGSCSqEM2JXsRY4nZiBfEU8QZxnPiBRCVpk7ikEFIGaS/pOOky6SHpDZlM1iDbkoPJ6eS95CbyVfJT8nsxmpieGE8sQmybWLVYh9iQ2CsKnqJO4VA2UHIo5ZQzlDuUWXG8uIY4VzxMfKt4tfg58VHxOQmahKGEh0SiRLFEs8RNiWkqlqpBdaBGUPOpx6hXqRM0hKZK49L4tJ20Bto12iQdQ2fRefQ4ehH9Z/oAXSRJlTSW9JfMlqyWvCApZCAMDQaPkcAoYZxmjDA+SilIcaQipfZItUoNSc1Ly0nbSkdKF0q3SQ9Lf5RhyjjIxMvsl+mUeSKLktWW9ZLNkj0ie012Vo4uZynHlyuUOy33SB6W15b3lt8sf0y+X35OQVHBSSFFoVLhqsKsIkPRVjFOsUzxouKMEk3JWilWqUzpktILpiSTw0xgVjB7mSJleWVn5QzlOuUB5QUVloqfSp5Km8oTVYIqWzVKtUy1R1WkpqTmrpar1qL2SB2vzlaPUT+k3qc+r8HSCNDYrdGpMc2SZvFYOawW1pgmWdNGM1WzXvO+FkaLrRWvdVjrrjasbaIdo12tfUcH1jHVidU5rDO4Cr3KfFXSqvpVo7okXY5upm6L7rgeQ89NL0+vU++Vvpp+sP5+/T79zwYmBgkGDQaPDamGLoZ5ht2GfxtpG/GNqo3uryavdly9bXXX6tfGOsaRxkeMH5jQTNxNdpv0mHwyNTMVmLaazpipmYWa1ZiNsulsT3Yx+4Y52tzOfJv5efMPFqYW6RanLf6y1LWMt2y2nF7DWhO5pmHNhJWKVZhVnZXQmmkdan3UWmijbBNmU2/zzFbVNsK20XaKo8WJ45zkvLIzsBPYtdvNcy24W7iX7RF7J/tC+wEHqoOfQ5XDU0cVx2jHFkeRk4nTZqfLzmhnV+f9zqM8BR6f18QTuZi5bHHpdSW5+rhWuT5z03YTuHW7w+4u7gfcx9aqr01a2+kBPHgeBzyeeLI8Uz1/9cJ4eXpVez33NvTO9e7zofls9Gn2eedr51vi+9hP0y/Dr8ef4h/i3+Q/H2AfUBogDNQP3BJ4O0g2KDaoKxgb7B/cGDy3zmHdwXWTISYhBSEj61nrs9ff3CC7IWHDhY2UjWEbz4SiQwNCm0MXwzzC6sPmwnnhNeEiPpd/iP8ywjaiLGIm0iqyNHIqyiqqNGo62ir6QPRMjE1MecxsLDe2KvZ1nHNcbdx8vEf88filhICEtkRcYmjiuSRqUnxSb7JicnbyYIpOSkGKMNUi9WCqSOAqaEyD0tandaXTlz/F/gzNjF0Z45nWmdWZ77P8s85kS2QnZfdv0t60Z9NUjmPOT5tRm/mbe3KVc3fkjm/hbKnbCm0N39qzTXVb/rbJ7U7bT+wg7Ijf8VueQV5p3tudATu78xXyt+dP7HLa1VIgViAoGN1tubv2B9QPsT8M7Fm9p3LP58KIwltFBkXlRYvF/OJbPxr+WPHj0t6ovQMlpiVH9mH2Je0b2W+z/0SpRGlO6cQB9wMdZcyywrK3BzcevFluXF57iHAo45Cwwq2iq1Ktcl/lYlVM1XC1XXVbjXzNnpr5wxGHh47YHmmtVagtqv14NPbogzqnuo56jfryY5hjmceeN/g39P3E/qmpUbaxqPHT8aTjwhPeJ3qbzJqamuWbS1rgloyWmZMhJ+/+bP9zV6tua10bo63oFDiVcerFL6G/jJx2Pd1zhn2m9az62Zp2WnthB9SxqUPUGdMp7ArqGjzncq6n27K7/Ve9X4+fVz5ffUHyQslFwsX8i0uXci7NXU65PHsl+spEz8aex1cDr97v9eoduOZ67cZ1x+tX+zh9l25Y3Th/0+LmuVvsW523TW939Jv0t/9m8lv7gOlAxx2zO113ze92D64ZvDhkM3Tlnv296/d5928Prx0eHPEbeTAaMip8EPFg+mHCw9ePMh8tPN4+hh4rfCL+pPyp/NP637V+bxOaCi+M24/3P/N59niCP/Hyj7Q/Fifzn5Ofl08pTTVNG02fn3Gcufti3YvJlykvF2YL/pT4s+aV5quzf9n+1S8KFE2+Frxe+rv4jcyb42+N3/bMec49fZf4bmG+8L3M+xMf2B/6PgZ8nFrIWsQuVnzS+tT92fXz2FLi0tI/QiyQvpTNDAsAAAAJcEhZcwAADdcAAA3XAUIom3gAAAAddEVYdFNvZnR3YXJlAEdQTCBHaG9zdHNjcmlwdCA5LjIzKPqaOAAAGsJJREFUeJzt3U+I49idB/A3u5PZTTUMVkPVkoFQJTUJSzVhg1WVQ2ahCvzqUD3snErew0J65lAy9OSYlnybHpaA1dMQWJgO8hwyvbvsQepDIEz3wW+yVYRMNikrJAdXyMFqN6ED64J+TcC1u5OA9/DSGrXLVrls2frj7+dky5b95J/80/tnv5f6/T4BAIjDXyRdAADIDyQUAIgNEgoAxAYJBQBig4QCALF5OekCpJfv+77vE0IopUmXBSAbUEMZrl6vU0objUaj0VAURVGUpEsEkAEvYR7KWb7vU0o9z5MkiRDCOb98+TI+KIBzoYYyBOdcURSRTQghkiTZtp1skQAyATWU4VRVpZSWy2VVVZMuC0BmoIYynOd5V65csW2bUqqqquu6SZcIIANQQzmf6FJhjKFrFiAaaihD1Ot1z/OCu4qiaJomhpABIAISyhDtdjvcC8s5d10X1ROAc2Fi20iiX5ZzzhirVqtIKADnQh/KSJxz0fDBTFmAMSGhAEBs0IcCALFBQgGA2CChAEBsMMozBO/1vE6n0Wp5jx//V7u98uqr6urqhiyrq6v06tWkSweQXuiU/TOv0/E6nWan43U6v3z8WGyUl5f/7stf7vf7h7/97bPTU7GxtL4e5BdlZSW5IgOkzuImlHA1xOt0zuYLur4uXboUPN/vdtnxcbvbZcfH4YxD19evrKzQ9XV1bW3+RwGQKouVUERSOFsNmSApsFZr/GQEsCDyn1DEN98/OWHHxzP65g9tLhVXV9W1tY21NXVtDZUXWBA5TCij2iZBBpnp15v3euz4uPnokff48SfHx2JjYWlJXVtTV1d3rl5Fty7kWE4SSnQ1JMHeU6/TCbLbo5MTsbG4uho0stCtC3mS1YTid7ve48fNR4+GVkPSOb4blHmg8kLX11NbZoALyVJCYa1W8IUMrvZpqIZMJmeHA0BSnlAiLunK8nKe+iOS7fcBiEvqEsrQ6/aidTrMYWQKYBaSTyijhkXyVw2ZzKgpvHR9HWPSkDbJJBSMfUwmPLt36Ji0uraGygskaE4J5dzZGfgmTAB5GdJmhgll1OmO+aOzgJYjpEGcCWXoz+1QDUkExqQhEdMmFHQZpl8WJwFCRl04oVz0V/+QNhiThtkZK6HwXs89OorlV/+QKtH/56BvbSG5wIWMW0N56e23Ca5juXa27tl8911cKuBCxk0oXqeDc2uhIOIwgeRnygJAbmAZDQCIDRIKAMTm83V5xNrgiqIoiiK2MMYIIZIkqaoa3BXCT4NM833f931CiKqqkiQNbBHLxQsIOpzr8z4U0zRd1+Wc+74vSZLv+7qui9tiu2manueJ5EII8X3fdd3gLmSUpmn3798vlUqVSkXTNEKIaZqMMd/333zzzSdPniDocAH9EMMwZFk2DCO8pdFoBHdLpVJwu9lsyrLch+zb398PR7nf79dqNdu2xW0EHcY32IeiaZrneeHWzSiqqiqKMs4zIeXK5bLjOOEtjuOI2soABB2iDemUrdfrpmmeu6doHKH2mwOUUsYY51zcZYxRSkV/ygAEHaINWSxdURRKqWVZZ9OK7/vBRtd1LcsaetpB5lQqleBCYtu2ZVnBQwg6jG9IQiGEVKtVSunZSq8kSTs7O+J2+JyDrNM0jVJqmqbv+5zz8GgOgg7jG55QJEmqVqumaQ4ME0qSRCmdS8FgrhRFEYPEjuOUy+XwQwg6jG/kxDZRPQlPQ4B8K5fLtm0zxnRdT7oskFWfJxTGmOu6rusGlyPLsoKEIjrqPM+jlFJKXddNoLAwS5qmiSgHWxB0uCj8OBAAYoPf8gBAbJBQACA2SCgAEBskFBjO73b/8ze/SboUkDHD56EM8Ltd03Wrb7yB/wTMt4G12f7m1Vf/+w9/EEsRYjUfGMd4CeXk5H6zWdnennFhIAFiVY2za7NpGxtfvnz5d0+fsuPj2w8fiicHq/lgqQMYaqyEAnkyat0vbXMzYlGU8Go+95tNsRGrIMAAJJSFMGqdaWN3d8y2DA2tjhxeLjKovGDVaiBIKDkW0ZaZcu10kTLE/PzwIu0fHh5+eHhIsEj7AkNCyY/J2jJTki5d0jY3tc1NcVdUhUR+ud9sivoLunUXx7hLkV7+9reN3V3rxd+hQuJGtWXS8AWOXqQd3bq5NFYNBf1tqTK7tky8lJUVZWUlqLygW3cRoMmTAYm0ZWKHbt1FgISSUtOPy6QZunXzaty/L3jp7bfRhzJro9oy6urq4nyvwt26aesVgnOhhpKkfLRl4hVu74Q/H8zWzQQklHnLd1smXujWzZxxE0phaYmfns60KDmWlXGZlEO3bvqN24dC33+fEMJu3pxxeXJiVFuGrq8vbFtmdsLdup8cH4uN6NZNBBJKbNI8x2yhoFs3QehDmQraMimEbt0EIaFcDMZlsgXdunOGhDIWv9utPXiAcZmsG7Nbt/7WW4kVMePG7UOpHxwoy8uLXIeX3nln0eaYLY5wt67X6fAPPki6RFmFhb4AIDb413sAiA0SCgDEBgkFAGJz/igPYyx8V5IkVVVnVp4k+b7v+z4hRFVVSZIGtnieFzxTURRFUZIqJ0wvfFafjWb0oxDhnE5Z3/d1XRc3pOdc151X8eZK07T79++XSqVKpaJpGiHENE3GmO/7b7755pMnTzzPC5Kp7/uu6+Y1t+ab53mmaY6KZvSjcI7+eAzDaDQaYz45u/b39wcOs1ar2bYtbpdKpWB7s9mUZXmuhYNYRUcTsZ4M+lBeUC6XHccJb3EcR9RWBqiqqijKQHsQMio6moj1+JBQXkApZYxxzsVdxhilVPSnDOCc+76PanA+REcTsR4fpt4PqlQq9XrdNE1CiG3blmUFD/m+L7YTQlzXtSxraK6BTIiOJmI9GSSUQZqmUUpN0/R9n3Me7uGXJGlnZ0fcDicayKLoaCLWk0FCGaQoihgkdhyn/OKfckuSRClNqmAQr+hoItaTQR/KEOVy2bZtxpgYMgeAMZ2TUDzPo5RSSl3XNU1T3J5PyRKkaZrojg22iLvhTyPB4sGUoqOJWE8DvzYGgNigyQMAsUFCAYDYIKEAQGyQUM7nd7t3f/xjv9tNuiAwc6zV+sFPfsJ7vaQLklXolI3CWq3agwefHB9feuWV3mef7W9tlTc38YeyueR1OqbrinXC5OXl6rVr+vZ20oXKHiSU4dyjI/vg4JPj48LSkr61tfu1r/3Hz3/+4eEhIaS0vl69dg1pJTfEkgYfHh6KWP/tl770zz/60aOTE3l5uX79OgJ9IUgog+oHB7UHDx6dnIjTq/rGG8FaLX63ax8c1A8Pn52e4iKWA7zXq338sVhDY39rq3rtWrAiSv3gwHTdZ6enuH5cCBLKn/Fezz06EqlEXl6ubG/rW1tDl30SZ6HbbJ77TEgz03HEtaG0vl6/fv3s4koi0OI5A+kGRkFCeeG8uVC9I6IuA2kWBG6c2gfv9UzXFa1dY3cXUY620Akl3IQpra9XtreDNSvHF/S2kDPVZkgb1mqZrvvLx48v2mL1u13Tde83m7h4RFvQhBL0w5GYOllZq2UfHIjlcjEYlELBIE5haan6xhvmtWsTvEgw6ocetFEWLqGwVss5OhKpZH9rq7K9HePy5rHnKZjewCDO9JUL9+jIdF0MAw21QAkluLyQGbdNMBiUEhGDONPDMNBQC5FQ6gcH9sHBLx8/nmcDGINByTp3EGd6GAY6K+cJJQ0DMWkow0K50CDO9DAMFJbPhDJQO0hDowODQXMw8SDO9DAMJOQtoYRrocXV1cr2duKpJAyDQTMSyyDO9DAMlJ+EkqERlgwVNf1iH8SZ3iIPA+UhoYS/n3sbG5Xt7UyEEINBU5rpIM70FnMYKNsJZW4jwbODwaDJzGEQZ3oLOAyU1YQy8PcCle3trIcKg0FjmvMgzvQWahgoewkl3188DAZFSHAQZ3oLMgyUsYSi3rolzqd8Nw3Cg0HOjRsT/GQxf6wHD6qum+wgzvSCRnpxddW7dSvp4sQvYwnFevBAWlrK1qVpYqKz2dK0vObNCxF92Pm4sLtHR/7JSXbTYoSMJRQASDP86z0AxAYJBQBig4QCALF5OekCnMP3fd/3CSGqqkqSNHRLpnHOPc9TFEVRFLGFMUYIkSRJVdXgrhB+2iLI9LEvaGT76ba3t0cIKZVKjuOILYZhFIvFQqFgGEayZYuFYRiyLBcKhadPn/b7/Xa7XSqVZFkuFovtdrvZbJZKpUKhUHpOluVms5l0qech68e+mJFNe0Lp9/v7+/uNRiO8pVar2badVHliJ868cH40DCN8yKVSKbjdbDZlWZ5r+RKV6WNfwMhmoA+lXC47jhPe4jiOpmlJlWcWNE3zPC9cBx5FVVVFUcZ5Zv5k8dgXLbIZSCiUUsYY51zcZYxRSnPQezKgXq+bpnnu0zjnvu+LRviiyeixL1Rk094pK1QqlSAqtm1blpV0ieKnKAql1LKssyef7/vBRtd1LcvKXz4dJQfHvlCRzUZC0TSNUmqapu/7nPOc9IefUa1WKaVnW3OSJO3s7IjbuUymEfJx7IsT2WwkFEVRVFX1PM9xnHK5nHRxZkWSpGq1aprmQMaUJIlSmlSpkpWPY1+cyGagD0Uol8u2bTPGdF1PuiwzJC5inuclXRCI2aJENulhpgsYGIHLh0ajIcuyLMvBCGK73S4UCmJwsdFohGcrBJNxFkHWj30xI4tfGwNAbDLT5AGA9ENCAYDYIKEAQGwyllC8TifpIsyP1+kYjmO++LODhcV7Pb/bTboU8WCt1j/Zdv3gIOmCxC9LnbKm49x++LD/gx8kXZCZCxZe+KuXX/6/P/1JXl62NG3B/6r6pbffNnZ3rYzPQgr++/6vv/CF//3jH0vr65amqWtrSZcrNhmroSwC03EUw/jw8HB/a+v4u9+1r1/nvV757l36/vu5uUQvJtNx1Pfeu99sGru7v//e94zdXa/T2XjvPf2jj3ivl3Tp4oEaSooEa+IOLGEVXnMz9ytFjULff58Qwm7eTLogk2Ctln7vnohsuEoSXpg508uDBJBQUsHvdvV79z45Po5Ywip4TmFpydK0BVlLJJDRhBJe32tU1IIFzIqrq5ampX8txAjZ+C1Pjo1f+1BWVtjNm6IWU7l3zzk6ylnzO3+CBZijI0uvXvWuXrUePKh9/PHOnTt7GxuWpmV0xUgklCSJc+jZ6en455C2ualtboozdeO99/a3trASWAqNauNEMK9d07e2xNWFHR/rW1tZ7IFGQklGcMJNVsu1yuXK9rYYCXKPjvLR/M6HcBvHvn79Qi1T6dIlq1wuf+MbpuvefvjQbTYzN7qXpT6U+sFB5d69tmVltDYojNOoHl+emt/RMtGHMmYbZxz1g4PagweijlO/fj0r5/xf3srOis281/u3Tz/9h69/PSsf7gDe69364Q//8fvfP/79743dXfedd/7+q1+d8jWVlZXK9vYXX3nlR7/61YeHh60nTzbW1nLZAvrXTz8lhHzr9deTLshwrNWid+58/Otfv/6Vr7g3bnzr9de/+Mor07ygurb2rW9+s9/v//vPfvYvjP3PZ59trK1N+ZpzkKUaCmu1du7caXznO1m8DtcPDkzXfXZ6OqMLTjAXrrC0lNHmdzT9o4/coyP+wQdJF2RQvFXOoa+fodE99KHMXNAqkZeX3Rs3ZpQNpUuX6m+9JTpWMtr8jiYtLT07PU26FINibOOMIkb3RKdb+kf3MFN2hvxuV//oo507d/yTk5qm+bdvz7pupa6tsZs3Mbl2DlirpRjG7YcP1bW15rvvWuXyTFua9OpV//bt9E+uRQ1lJsTsEnHtmv/Irr69rW1uigHIK6a5sJNrZ2SacZwpidE9Mbk2naN7SCjxC/fPJ1U7FQOQle1t/d692w8f1g8P09/8zoQ5tHGiKSsr9bfeKm9umq5bdV3nF79I1egeEkqcvE7HdF0xg965cSPxLgxMro3RBHPVZie1k2uzNMpDUvwb9vQPsgSX1oxOrk3wl1yzHseZRvDTjZSceKihxCD8Xa1eu5aGC8VZ+Zhcy3u9OafCxNs40dI2uRY1lKmM+sOBNMvo5Nr5z0JKVRtnHGmYXJuxYePC0tKV1Fz/xagwIcS+fp3dvJmVbya9etW7daumaf7JiXb3btLFuQB5eXluFQTe62l37/JeTwQ3/dmEEKJvb3vvvmvs7n5yfKzfu5dIGTJWQ0kb1mqpmZ3qzns9r9PJSh6cv+wG1+92+elpIkkQCQUAYpOxJg8ApBkSCgDEJnV/X8AY03W91Wrt7OwkWAb/OUVRkirGjHDOf/rTnxJCJEkSW8Txcs5fe+21RIv2Akrpa6+9Fvvnn9fgpiWsCS3Sfo5gwfq57RhoNpuyLMuyXCqVSqVSsVjc399/+vTpPMswU4ZhyLJcKBTEQbXb7VKpJMtysVhst9tJl+5zhmE0m814X3PK4KY5sikJKxLKEIZhGIYR3LVt+0Ivm+bTThAnX/gYDcNoNBoJFmlupgluyiObhrAm1ofiuq6iKJRSVVV1XTdNc+AJpmmqqqqqKqWUcx5+iDEmdlQURdM03/eDjZ7n0ZCBHSej67qiKJ7nnS28oii6rgfvMrsyxE7TNM/zGGNJF2QIXdfFRxf+zAVKqWmalmWJE2Pocy76XuMENyuRTT6s88xeYbIsBzUxx3EGcj8hpFaridu2be/t7QUPNZvNcC2u0WjIshzUWmdRQxElDLbYth2uJ58tfMqvY/3nB9hut4vFYrAlbTWUUUUihASxcBwnfG6M+bITBzflkU1DWJMc5RGdRoQQTdMsywo/VCwWgzpLuApACKnVapZlBd1plNJqtVqv12da1KCjKyhAsEXTNFVVZ12AWRDX4YFPPv2KxWJQZk3Tpr8a5yy4yYY1sR8HMsZs2240Gpxzznm1WlVVNXg0HOOzO7quG96iKEqtVjvbaIqRSHwC51zTtIEnXLlyZXbvPjvVapVSevZw0mzg3Hj27NmUL5i/4CYY1mQSiqhxBEnU933RKI3IIwFVVQcG/Djn4WQ0C47jBKVVFMV13XGKmn6SJFWrVdM08zSAelH5C26CYU2myVOr1WzbDu6K+I3ZxVWpVMKVEc65bduVSiV4qXAHW/jiMzERmCBnlcvlgdoQYyxc8Z5FGWZHXMem7NrMrgsFN0ORTSys8+ywCdRqtWKxWCwWDcPY398vFou2bYuHms1mqVQqFAqis+3p06fibrg/zLbt8L6O4wQPiYkGoneqWCzu7e1daApJ/3kvbzBVYWAcThAvPupdpi/DTIUPUGxpt9uFQiElnbJiHDeYQyFuiwkpYm5FcG70+/29vT0S6qM915TBTXNkUxLWJH8cyDkXGZRSGuO+wUOqqs607iouXEPfZW5lgBkZFVxENhp+bQwAscGPAwEgNkgoABAbJBQAiA0SyuR4r1c/OMj0Wp9ep+N1OkmXIo0y/cn43S5rtRJ5aySUyXmdTuXePf/kJOmCTM50XfPFaccgZPqTsQ8Odu7cSeStkVAmJ/6+ONMJBSBeSCiTE/8q3s5ykwcgXkgoABAbJBQAiA0SCgDEBgkFAGKDhAIAsUFCAYDYIKFMpbi66j1+nHQpANICCWUqYm4bAAhIKAAQGyQUAIgNEgoAxAYJBQBig4QCALFBQpmKtLSEvy8ACCChTEVZXn6EhALwXGJrG+fDlZUVY3c36VJMTl1dTboIKZXpT+bKykppfT2Rt8a6PAAQGzR5ACA2SCgAEBskFACIDRJKxui6TimllIolu6fHGKOUmqYZy6vBPInYJV2KFyChZEy9XmeMqarKOY/lBSmljLG40tPZF5/Fy4IgSZKqqkmX4gUYNgbIKlVVkVCyyjRNxpgkSb7vU0oty5IkSTxEKVVV9fLly47jiI2WZU0cadd1TdNUFOXsG022o2manueVy+V2u+26rniCZVmapp09QEKIJEmu64bfdNSxRxw4Y6xWq3meF66kDLxs4nRd931fHJeiKDs7O7Ztc85d1yWEiGbgzs6OuCE+RkKI+JQmNk44Ik42QohlWY1GI7hbLpd1XY/eMeJIgxM14k2DU4tzrqqqJEmWZQ0/vD6Mp9FoBLdrtVqtVgs/SggxDEPcdhxnb29vsnexbXt/f//p06fBS5VKpbNPMwwjXJ5zdzQMo1Ao2LYt7rbb7WKx6DhOuPzBEdm2PVD+iGOPPvChhU8VQoj4WPb390Vp2+22OIpGo7G/vx9+cqlUarfb07/pueGIPtnCGo1G8PlH7xhxpOfuK8tycOCjzsk/v8uoB+CsZrPZeC4cxX6/XywWw3cLhcJkbyHLcpAUBMMwgjMvvHEgoUTvaBjGwHej3W6HT4uB8p89Y0Yde/SBZyKhiBvhjzQodjiDnM0vEzs3HP3Iky3s7KOjdow+0uh9ZVm2bTv4KJrN5qjyoMkzFs/zNE1TFCWozw90Nw7U5J89ezbZG3HOB1oihJArV65Mv+NACUX1ddSjYdHHHteBp1O1Wq3VavV6nRAS3IhFRDjOPdlGmXjHc/dljNm23Wg0OOec82q1OqpFj4QyFl3Xw61Nxli4ERsjRVEm62g4d8eBcRzGmKIo47zy3I49hSiltVpNdJooijLmJzaOiHBM/IFPE6mIfUWmCzpNRA8LpXToyYZh47FwzoN4c85t257RG5XL5YEpIYyxcXoBz93R8zxd18XJwRjTdb1arY5TpGmOXZKk8DfH9/3x900JUUmxbXtkN+REIsIx8Qc+TaQi9hWHH9wVeWTUrAUklLGIOp5pmmJemagOiK51kbBFjVE8WdyYbKqYaZpicoFpmqZpqqpq23Zw3ajX6+LiIHrdwzPconckhOi6vrGxoWkapdRxHDGZhRAiBmKC8nPOxd2gxjvq2Mc58Gq1qmlaUCTTNOOaPhMLEcHwEIn4MD3PC45CfA6KosQ7PjUqHCTyZIsWseO5Rxqx7+XLl0Xxgker1eqoyhp+bTwuzrnneXObSiQqF2KIbvodxUkz8TV2mmMX+54tUoZomlav12Ms/LnhGP8Dd1232WwGLzV9pEbtG8Qxul8GCWUhTJlQFtnANzYWcYWDc67reqVSSdGM5FHDP5AbhmEE4Q5PdoAI7Xa7UCgEn1vEwO1FTR+O4BVkWY6YopII1FAAIDbolAWA2CChAEBskFAAIDZIKAAQm/8HAWoX36HGSrYAAAAASUVORK5CYII=",
      "text/plain": [
       "Tree('S', [Tree('NP', ['I']), Tree('VP', [Tree('VP', [Tree('V', ['shot']), Tree('NP', [Tree('Det', ['an']), Tree('N', ['elephant'])])]), Tree('PP', [Tree('P', ['in']), Tree('NP', [Tree('Det', ['my']), Tree('N', ['pajamas'])])])])])"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAEACAIAAAAcPgvyAAAJMmlDQ1BkZWZhdWx0X3JnYi5pY2MAAEiJlZVnUJNZF8fv8zzphUASQodQQ5EqJYCUEFoo0quoQOidUEVsiLgCK4qINEWQRQEXXJUia0UUC4uCAhZ0gywCyrpxFVFBWXDfGZ33HT+8/5l7z2/+c+bec8/5cAEgiINlwct7YlK6wNvJjhkYFMwE3yiMn5bC8fR0A9/VuxEArcR7ut/P+a4IEZFp/OW4uLxy+SmCdACg7GXWzEpPWeGjy0wPj//CZ1dYsFzgMt9Y4eh/eexLzr8s+pLj681dfhUKABwp+hsO/4b/c++KVDiC9NioyGymT3JUelaYIJKZttIJHpfL9BQkR8UmRH5T8P+V/B2lR2anr0RucsomQWx0TDrzfw41MjA0BF9n8cbrS48hRv9/z2dFX73kegDYcwAg+7564ZUAdO4CQPrRV09tua+UfAA67vAzBJn/eqiVDQ0IgALoQAYoAlWgCXSBETADlsAWOAAX4AF8QRDYAPggBiQCAcgCuWAHKABFYB84CKpALWgATaAVnAad4Dy4Aq6D2+AuGAaPgRBMgpdABN6BBQiCsBAZokEykBKkDulARhAbsoYcIDfIGwqCQqFoKAnKgHKhnVARVApVQXVQE/QLdA66At2EBqGH0Dg0A/0NfYQRmATTYQVYA9aH2TAHdoV94fVwNJwK58D58F64Aq6HT8Id8BX4NjwMC+GX8BwCECLCQJQRXYSNcBEPJBiJQgTIVqQQKUfqkVakG+lD7iFCZBb5gMKgaCgmShdliXJG+aH4qFTUVlQxqgp1AtWB6kXdQ42jRKjPaDJaHq2DtkDz0IHoaHQWugBdjm5Et6OvoYfRk+h3GAyGgWFhzDDOmCBMHGYzphhzGNOGuYwZxExg5rBYrAxWB2uF9cCGYdOxBdhK7EnsJewQdhL7HkfEKeGMcI64YFwSLg9XjmvGXcQN4aZwC3hxvDreAu+Bj8BvwpfgG/Dd+Dv4SfwCQYLAIlgRfAlxhB2ECkIr4RphjPCGSCSqEM2JXsRY4nZiBfEU8QZxnPiBRCVpk7ikEFIGaS/pOOky6SHpDZlM1iDbkoPJ6eS95CbyVfJT8nsxmpieGE8sQmybWLVYh9iQ2CsKnqJO4VA2UHIo5ZQzlDuUWXG8uIY4VzxMfKt4tfg58VHxOQmahKGEh0SiRLFEs8RNiWkqlqpBdaBGUPOpx6hXqRM0hKZK49L4tJ20Bto12iQdQ2fRefQ4ehH9Z/oAXSRJlTSW9JfMlqyWvCApZCAMDQaPkcAoYZxmjDA+SilIcaQipfZItUoNSc1Ly0nbSkdKF0q3SQ9Lf5RhyjjIxMvsl+mUeSKLktWW9ZLNkj0ie012Vo4uZynHlyuUOy33SB6W15b3lt8sf0y+X35OQVHBSSFFoVLhqsKsIkPRVjFOsUzxouKMEk3JWilWqUzpktILpiSTw0xgVjB7mSJleWVn5QzlOuUB5QUVloqfSp5Km8oTVYIqWzVKtUy1R1WkpqTmrpar1qL2SB2vzlaPUT+k3qc+r8HSCNDYrdGpMc2SZvFYOawW1pgmWdNGM1WzXvO+FkaLrRWvdVjrrjasbaIdo12tfUcH1jHVidU5rDO4Cr3KfFXSqvpVo7okXY5upm6L7rgeQ89NL0+vU++Vvpp+sP5+/T79zwYmBgkGDQaPDamGLoZ5ht2GfxtpG/GNqo3uryavdly9bXXX6tfGOsaRxkeMH5jQTNxNdpv0mHwyNTMVmLaazpipmYWa1ZiNsulsT3Yx+4Y52tzOfJv5efMPFqYW6RanLf6y1LWMt2y2nF7DWhO5pmHNhJWKVZhVnZXQmmkdan3UWmijbBNmU2/zzFbVNsK20XaKo8WJ45zkvLIzsBPYtdvNcy24W7iX7RF7J/tC+wEHqoOfQ5XDU0cVx2jHFkeRk4nTZqfLzmhnV+f9zqM8BR6f18QTuZi5bHHpdSW5+rhWuT5z03YTuHW7w+4u7gfcx9aqr01a2+kBPHgeBzyeeLI8Uz1/9cJ4eXpVez33NvTO9e7zofls9Gn2eedr51vi+9hP0y/Dr8ef4h/i3+Q/H2AfUBogDNQP3BJ4O0g2KDaoKxgb7B/cGDy3zmHdwXWTISYhBSEj61nrs9ff3CC7IWHDhY2UjWEbz4SiQwNCm0MXwzzC6sPmwnnhNeEiPpd/iP8ywjaiLGIm0iqyNHIqyiqqNGo62ir6QPRMjE1MecxsLDe2KvZ1nHNcbdx8vEf88filhICEtkRcYmjiuSRqUnxSb7JicnbyYIpOSkGKMNUi9WCqSOAqaEyD0tandaXTlz/F/gzNjF0Z45nWmdWZ77P8s85kS2QnZfdv0t60Z9NUjmPOT5tRm/mbe3KVc3fkjm/hbKnbCm0N39qzTXVb/rbJ7U7bT+wg7Ijf8VueQV5p3tudATu78xXyt+dP7HLa1VIgViAoGN1tubv2B9QPsT8M7Fm9p3LP58KIwltFBkXlRYvF/OJbPxr+WPHj0t6ovQMlpiVH9mH2Je0b2W+z/0SpRGlO6cQB9wMdZcyywrK3BzcevFluXF57iHAo45Cwwq2iq1Ktcl/lYlVM1XC1XXVbjXzNnpr5wxGHh47YHmmtVagtqv14NPbogzqnuo56jfryY5hjmceeN/g39P3E/qmpUbaxqPHT8aTjwhPeJ3qbzJqamuWbS1rgloyWmZMhJ+/+bP9zV6tua10bo63oFDiVcerFL6G/jJx2Pd1zhn2m9az62Zp2WnthB9SxqUPUGdMp7ArqGjzncq6n27K7/Ve9X4+fVz5ffUHyQslFwsX8i0uXci7NXU65PHsl+spEz8aex1cDr97v9eoduOZ67cZ1x+tX+zh9l25Y3Th/0+LmuVvsW523TW939Jv0t/9m8lv7gOlAxx2zO113ze92D64ZvDhkM3Tlnv296/d5928Prx0eHPEbeTAaMip8EPFg+mHCw9ePMh8tPN4+hh4rfCL+pPyp/NP637V+bxOaCi+M24/3P/N59niCP/Hyj7Q/Fifzn5Ofl08pTTVNG02fn3Gcufti3YvJlykvF2YL/pT4s+aV5quzf9n+1S8KFE2+Frxe+rv4jcyb42+N3/bMec49fZf4bmG+8L3M+xMf2B/6PgZ8nFrIWsQuVnzS+tT92fXz2FLi0tI/QiyQvpTNDAsAAAAJcEhZcwAADdcAAA3XAUIom3gAAAAddEVYdFNvZnR3YXJlAEdQTCBHaG9zdHNjcmlwdCA5LjIzKPqaOAAAHWNJREFUeJzt3UFs49iZJ/DXgwAB7EVGbMDO5rI2X2ExC/km2nXIAGMDfDqUA+SwMHWbqeQgCug+zGWK1C3Ijaz2dTCg+pAUFnMh65BLlQ9iY8u7yBzKYm72YBCIJS+Qw9qAWRhEBrJAxnv4uhhGkmVZokRS+v9OEiWSj7L16Xvfe9L77O7ujgEAzOYvsm4AACwDhBIASAFCCQCkAKEEAFKAUAIAKfhO1g2YSRiGYRgyxoQQWbcFYKUVOCtptVpCiHa73W63Oeec86xbBLC6PivovJIwDIUQQRBIksQYi6Lo888/L+i1ACyBomYlURRxzimOMMYkSXIcJ9smAayyomYljDFFUYQQtVpNUZSs2wKw6oqalTDGgiB48uSJ4zhCCEVRPM/LukUAq6vAWUkSlU5830fxFSATRc1KWq1WEATxXc65pmk0MAwAi1fUUNLtdpN11iiKPM9DSgKQlWJPUaPKaxRFvu83m02EEoCsFLtWEkURdXMw2xUgW8UOJQCQE0WtlQBAriCUAEAKil12jfr91unpv/z2t//tBz+oPX2qbG9n3SKAFVXIWknU73tnZ+3z89edDmPs+9/73v/9939njFW2tmpPn2q7u3xzM+s2AqyWgoUSiiDe2dnH29vS2pq2t9c4OFC2t8OrK+fdO6/T+XB9zRhTy+Xa3p62tyetr2fdZICVUIxQEvR67vv3rdPTj7e3jLH6/n51Z0fb27vvmXFMOdrdpWcipgDMVa5DyUCu8ai4MJy/3Bd9AGB2eQwlVEx137//zeUlm7kC4p2due/fU1WltLam7++jQAuQuhyFkoFiqryxoe3uNg4OUqmhjjw4YgpAWnIRSu4rps7jXOHVldfppJXyAADJMpRMXkydBwz6AKQog1AySzF1HjDoAzC7xYWSdIup84BBH4CpzT2UzLWYOicY9AF4rDmGkkUWU+cBgz4Ak0s/lGRbTJ0HDPoAPCi1UJK3Yuo8YNAH4D6zhpL8F1PnAYM+AAOmDCVFLKbOAwZ9AMijQ0nRi6lzgkEfWHGThpLlK6bOAwZ9YGVNFErCq6snpslQFJhYctCntLYW/eM/Zt0igPmaNCtpvXsnyuVVK4XMLry6Ci4vkb7B0svFN4MBoOiweAUApAChBABS8O06OLT4Luc8XsHb933GmCRJiqLEd0nyaUDCMAzDkDGmKIokSQNbaGFjglcPltK3tRLTND3Pi6IoDENJksIw1HWdbtN20zSDIKCwwhgLw9DzvPguaJr2+vVrVVUbjYamaYwx0zR93w/D8Mc//vHvfvc7vHqw5O4+MQxDlmXDMJJb2u12fFdV1fh2p9ORZfkOEur1evLluru7syzLcRy6jVcPltuf1Uo0TQuCINmXuY+iKJzzSZ65Omq1muu6yS2u61KGMgCvHiyfwbJrq9UyTfPB3agrhBQ9SQjh+34URXTX930hBNVNBuDVg+UzuPw451wIYdv2cEAJwzDe6Hmebdsj3yerrNFoxLHYcRzbtuOH8OrBchsMJYyxZrMphBjOzCVJqlardDv5JoGYpmlCCNM0wzCMoig5UoNXD5bbiFAiSVKz2TRNc2DMUpIkIcSiGlZInHMa+nVdt1arJR/CqwfLbfQUNUpJkrMhYEK1Ws1xHN/3dV3Pui0Ai/NtKPF93/M8z/PiT07btuNQQhXEIAiEEEIIz/OyaWwRaJpGL1e8Ba8erAJ8nQ8AUoDv4ABAChBKACAFCCUAkAKEEgBIwYh5JcO8szNpbU3s7My7Ncsk6PX8iwv3/fs//sd//Nfvf39Xlpd+eSBYZRON4Hz2058az57Zfz7nCkbyz8/b5+fxalvyxsZ/+fzz/3NzQ3crW1uiXMZv0MPymSgrKa2tRbe3825KcUX9vn9xES8PxBirbG01NC2ZhtDqH/7FxcuTk5cnJ7Suxa4s4xekYTlMFEqU7e3w+nreTSmc8OqKIgitesPGLu6hbG9TJkLrWrTPz1+enDDG4jX9RLmMJUGguCbq4IivvmKM+S9ezL89BRDnF7RM8tTre8brb/kXF5TLUCTCIiFQRAglk/LOzjofPsRFkHSrHsnlU9mnVdxFuYySChQFQsk4lDh0er34Ta6Wy9WdnfmNxVDKk6zaapSqYPgM8g2hZAQqZ3Q+fIiXExcUQRa4wmlyqVA2QzcKYDEQSv5koAgib2zEESTDVg0sac6wcjPkEkIJi2ufOZ/6MXLIufb0KWa+QR5MFEpM1315cnL3i18soEGLMXL0pEATUocnwmm7uzkMf7A6ViuUxHM6vrm4YJ8KELvb28XtLMTT81FSgWytRCgZOSyyZDNNB+bLZVIqhlW2zKFkNSdrLH4AG4AtXyi5b7xjNaeQznVaHUDSkoSS+0oG+GILGR7nXr4uHmRroq/z5dbwQIbx7Bnmhg4b/2XCQheeIScmykrst2+bnte17Tz0ETC9IhVFHw6HvJkolPjn59Xj4/Y//EO2n/atd+8w6XMehifp1Z4+1ff38cLC5CYKJUGv57x71zw8zPbzSvryS8YY5k3MT7Kk0vnZz1CdhckVaUmt8OoKufdi4KWGxypSKAGA3MLiFQCQAoQSAEjBA/NKfN9P3pUkSVGUebZnUBiGYRgyxhRFkSRp5BaYThRFQRBwzjnntIX+3PFfOfnXTz4NYIS7+3W7XVVVVVWVZblSqaiqenR0NOb583B0dMQYU1XVdV3aYhhGpVIplUqGYSy4MUvGMAxZlkul0s3Nzd2nPzf9rbvdbqfTUVW1VCqpn8iy3Ol0sm415NS4UBIzDKPdbs+7Kfep1+sDZ7csy3GcrNqzTCiaJIPywN9aVdX4dqfTkWV5oe2D4ihAraRWq7mum9ziuq6maVm1Z8lomhYEwUBPdiRFUTjnkzwTVlABQokQwvf9KIroru/7QghUSVLUarVM03zwaVEUhWG44GIZFEUBQgljrNFotFotuu04TqPRyLY9S4ZzLoSwbXv4oTAMzU8URbFtG0EcRirGN4M1TRNCmKYZhmEURRhKSF2z2RRCDHcbJUmqVqt0e2SsASDFCCWcc0VRgiBwXbdWq2XdnCUkSVKz2TRNcyBMS5IkhMiqVVAgxejgMMZqtZrjOL7v67qedVuWE6UkQRBk3RAopHHfwQmCgKpxYRhKkkSd5AwL+JxzTdOQZqcljsvxuAxVVT3Po1K3ZVlBEFCdtdFoYNQMxsDX+QAgBYXp4ABAniGUAEAKEEoAIAUIJTBC1O//z3/916xbAUVSjHkljLHw6sp6+7ZxcIAfHJ0T+in/zocP9Nuu/+m73/39H/5Aq3A92dxczTXJYHLFCSXX11+fntbw69CpGggftPFv/uqvvvud7/znv/zL/64oweUlrZjDGJM3NpStrV1ZXvrFUmEKhQklkCJaADQZPipbW8azZxQmrDdv/te//ds//d3fxWuV0NJl4fW1f3ERLx6ilsvK1lZ1Z0fZ3sYyF4BQsiooHASXl99cXNCWZPiIY0HU77dOT9VyObnmkUgseBj0ekGv1+n1gl7v5ckJ5SyVrS1le3t3exv9oJVVmFBCGXVweYlFPCc3YfhIst68+Xh72zw8vO+YtGYofXkh7h8Fl5dfn55+fXrKGCutrYlyeVeWla0t/LFWR2FCCf3f3/z+91k3JO+GwwctpTwmfMRGpiRjSOvr2t5evLyZf34eXF5SZBnoB01ydii0woQSGCPo9Wgd5WT4qO/vP7bH8WBKMl6yHxReXfkXF92rK//iIlm4FeXy7vZ2vBw6LA2EkqKi8EHVU1o/fLrwEXtsSjIe39zUP7Uh6veDXo9ypYF+EN/YqCYCEBQXQkmRjAwf2t5eKvXOGVOSMaT19YHCbZywvO504sJtXGFB4baIEErybjh80Od5dWcnxeGSdFOS8ZK9m/Dqisorw/0gmheHflBRIJTkERUaOr2ed3aWDB/zmx42v5RkPL65yTc3k4Vb6gclL1zZ3sYElvwrUigpra1l3YQ5isOHf3Hx4fqazT98xBaZkow3cgKLf3HxzaecBRP5c6tIoUTZ3g4+zc5cDveFj8bBwSJz+6xSkvHum8AyPJEfE1jyoEihZDkMhw/GmFoua7u7tadPF18ayE9KMsZ9E1iGJ/JjAktWEEoWYeS35ih8ZD4Ums+UZLzhCSzxRH7aiIn8i1ek33YVX33FGPNfvMi6IRO5L3xQBTEnKUDU73PDULa3i/KqjpfsB8Wz9TCBZTGQlaTpvvBhPHuWz//jIqYkYwz0g+JxdJrITzkL+kFzglCSgvHf2c/t/2shqiSzGJjAgon8c4VQMr3Wu3fu2dnkX7rNmyVLSca7byK/d3YWT+RXtrebh4fLGljnrUihpHFwIOVpakl0exv1+8UKH0mNgwPG2Aq+c8ZM5I9ub7NtW3EVqewKALmFX5wHgBQglABAChBKACAF+Sq7+r5vWZaiKLZtZ9uM+LYQIsOWzEMURUEQcM4557SFrleSJEVRMm3a4iT/xMmXYpJHYbS7/FFVNZN9SafTkWVZlmVVVVVVrVQq9Xr95uZmYQ2YN8MwZFkulUp0Ud1uV1VVWZYrlUq32826dYvQ6XRUVS2VSuonsix3Op1JHoX7IJSMYBiGYRjxXcdxJj9s/kPJ3adokrxGwzDa7XaGTVq85F+KPj8mfxSGZVMr8TyPcy6EUBRF13XTNAeeYJqmoiiKogghoihKPuT7Pu3IOdc0LQzD5PYgCETCwL7T0XWdcx4EwXD7Oee6rtNZ5teAedA0LQiCZCa/yujf6b5XY/yj8K1MApgsy3Eu7bruwCc5Y8yyLLrtOM7R0VH8UKfTSebh7XZbluVk72MeWQk1Mt7iOE6yyzPQ/qJkJYZhdLvdSqUSb1nlrOTm5mbMP9LwozAss7IrfYZTZjFQ1qpUKnGeouu667rxQ5Zl2bYdP18I0Ww2W63WcF6TLkmSkm0IgiDeomlap9NptVq6rs+1DamjxMq27Xm/evkUhmF84Z7n2bad/CuPfxSGZRNKfN93HKfdbkdRFEVRs9lMjh2M+Zv5vu95XnIL59yyrHm/GeJuFGMsiiJN0wae8OTJk7k2YE6azaYQYvhyVoEkSdVqlW4PjxiOfxSGZRBKqHwQ/3nCMKSywiRRX1GUMAyTWUwURQsYwnRdN24w59zzvOX4jJIkqdlsmqa5guOdkiSNGekf/ygMy6DsalmW4zjxXXpPTliebDQayQQkiiLHcRqNRvJoyfpoMpuYGr3T4oBVq9UGkiDf9+Oa3DwaMFeUkiTbDDCNxZdnLMuqVCqVSsUwjHq9XqlUHMehh+IhfSq13tzcxCP88e6O4yT3dV03eXAat6OyYqVSOTo6emy1jEq58bySgUFTQgcfeZbZGzBvyQukLd1ut1QqrU7Ztd1uJ2eODPwLjX8U7pPZN4NpziWbajrp+H3jRxVFmWs3hDKR4bMsrAEA+YEfGQCAFODrfACQAoQSAEgBQgnAt6J+/3/8+tf++XnWDSmkfP3IQLGYrhtcXhZ3BZmg1zM9z9Y0/Nh61O9bb960Tk9v//CH//fHP9b395uHh1iL61EQSqZX9AWMo37/m4uLqN/PuiEZa717Z719++H6Wi2Xn//wh//7t7/9+vT069NT49mz5o9+VLhf/84KQsn0pLW1okeTFeefn+uvXn24vpY3Npznz/WDA8bY3/71XzcPD/VXr16enLROT/X9fQSUSSCUTI9vbMRrX0Ox+Ofn1tu331xclNbWLE0z/3wxIL656b94Qc95eXLidTrNw0MKNHAfhBJYLeHVlfX27denp6W1tfFdGForh7o/jVevrLdvW8+fr+CyQRNCKIFVQbVVWuVz8sKqfnCgHxzYb99ab95Uj4/Vchkr+I2EUALLLx6g+Xh7q5bLrefPHzs6Yx4e6vv7dJDq8TGGeIYhlMCSSw7QzJJQSOvrdq3WODig/hGGeAYglMDSGjlAMyO+udn6yU8wxDMMoQSW0PgBmtlhiGcYJs5Pb1eWGWOYZ50r4dWV/stfVo+Pg17PePYsfPky9TgSEzs7/osXzvPnjLHGq1fcMFb5nwFZyfSktbWsmwB/Mt0AzewwxEMQSqDwZh+gmR2GeBBKoNjSGqCZ3YoP8SCUQFHNY4Bmdis7xINQAsUz7wGa2a3gEA9CCRTJ5N+gyYOV+hYPQsn06J84vL7OuiErIasBmtmtyBAPQsn06MfHuldXWTdk+UX9PjeMDAdoZjcwxON+8YW2t5d1o9KExStmEvR6xf01w6jfp/bnuY8Qs9++Vba2luDDnPpotqYV4mWfHEIJAKQAE+cBIAUIJQCQAoQSAEgBRnCKRNf1MAwZY7ZtK4oy+wF937csS1EU27ZnP1rqaIF3wjnnnGfYmMeiVeiTzabLkSRJUZRCX9pod1A0hmG02+0UD6iqaopHS+vInU5HVdVSqaR+Istyp9NJsXlzZRiGLMulUunm5ubu7q7b7dIlVCqVX/3qV4W+tJEQSopnRULJ8BE6nY4syzMecJEomhiGkdwS/+0KfWnDUCuZiGmaiqIIITjnuq5HURQ/JIQwTZN6HEIIIUQQBFOfyPM8zvnIE029r2maQohWq2WaZvwEz/OGL5AuYeCk01277/t0VyQ86nKGKYrCOU92DfJP07QgCB5scxEvbVDWsawYklmAZVmWZSUfZYzFnzyu6x4dHU13Fsdx6vU65cN0qJGf6iOzkvH7GoZRKpUcx6G73W63Uqm4rhu3P74ix3EG2j/Ltaebldzc3MiyHF9j/hmGYRgGvdrxlpFZSeEubRhCyaQ6nU77k2TKend3F/+jkFKpNN0phv+ZDMOI3//JjcOhZPy+hmHU6/Xko9R1H9n+4ff/1Nc+eyihDgKRZTkOf4VAzaYbFIKTf7tCX9owjOA8LAgCTdM45zRoQql78gmSJCXvfvz4cboTRVGkadrAxidPnqSy70AjOedxd2PgoaSFXft9JEmqVqt0O5/DTJNoNptCiIE/0HJcWgyh5GG6rnueFw+++r7fbrfncSIqYYx5Y8+y70AFx/f9SQYgF3bt95EkaSB4FZEkSc1mk2pVyY1LcGkxlF0fFkVR/B8QRZHjOHM6Ua1WM00zucX3/QlLcQ/uGwRBXDT1fV/X9Waz+eBhZ7x2SZKSIYwmxawmSklmKcnnXdY9rAJwHIe6tfV6vVKpUM+WSg9UcSiVSnG58ejoiCUqkY9lGAadgm4cHR3FFRDHceI5CJVKhW4nJyOM2Zc2xkeo1+vdbvcuMXeD2n9zcxPPd0jl2mmMc2STHtRut5OTLwpXSmi327Isy7Icv5jdbrdUKlHJqdCXNhK+GTwRmrlI8xQXcDrKJhRFmaKzM3JfSlim65DPeO20+3CTYMkglKyEWUIJwCQQSpafaZovX76k267rDg/0AMwOoQQAUoARHCiGoNfLugnpiPp903X//p//OVyuXwVGVgIFYLruy5OTu1/8IuuGzIp+d/7j7S1jrLS2pu/v27Va1o1KB6aoASyCf35uet5vLi/ljQ3viy+k9XXT82i1reVYHAcdHCiAJ5ubrLB9nKjf13/5y+rxcXh9bTx7Fr58KXZ2lO1t/8UL5/nzqN+vHh+Lr74qen8HWQkUAN/YYIxF/X7WDXm0uEdztLtra9rACj76wYG2t0dLhT0xzfyvNzgGQgnAXAz0aO7rwkjr63at1jg4oOXKi7u6MDo4ACkb2aMZvwstV+5+8QVjrPHqlfjqq8L15pCVAKRpfI9mPG1vT9vbM123dXq6+/OfF6u/g1ACkI4JezQPov4Oje+0Tk+bP/qReXiYblPnAaEEYFZRv2963tenp6W1NePZs9mnivDNTe/LLyk2NT3Pff/e1rScDxgjlEBhRLe3WTdhhFl6NOOJnZ1gZ4eOXz0+ru/vNw8PUzx+ujDbFYrhs5/+NJUP/BQlezRznWaWzHpyO0EWIzgAjzbFGM0spPX11k9+0vnZz5Tt7ZcnJ9ww/PPz+Z1uOujgADzO/Ho049EE2da7d6bnVY+P1XK59fx5fvo7CCUAk0prjGYWuZ0gi1AC8LDUx2hmkc8JsgglAA/IqkczHk2Q9c7OTM9rvHrlnp3ZmqZsb2fVHoQSgHvloUczXn4myGIwGIqBG4ayteV9+eViTleI8dek8OrK9LzXnU5pbS2TCbLISqAY+MYG/dTAYlhv3nx9epqrHs14AxNkRbm84M4OshKAEaJ+P+j1ctijmYR3dqbt7S34pAglAJACzHYFgBQglABAClB2hZwSQjSbTSFEisekBZXj46d45MzR2sycc845baGLXdhC1wglkFOpL1ceBIGu64wxerOZpqkoim3bk59FCJEMRrliWZbneVEUhWEoSVIYhpZl0W3P8+L4Mkd3ACvDMAzDMOK7juOoqjr57o968uIZhiHLcvICDcNot9uLOTtqJZA7uq4LIYQQQRAktwshTNO0bVtRlJFPmOJEnPPkQegDXAjBOdd1PYoi2u77Pp1OJMSP5oemaUEQZJI6oYMDudNqtRhjpmkOvFd93//ss88Mw6A3v+d5lNXPcq5qteq6LlUTWq1Wp9MJgoC6PJ7naZpGb8s4fOS2gxNrtVoUUBZ8XmQlUCSVSsW2bbodv89nkSyUWJaVLJ1omqYoCsW1AqGsKn6VFgZZCRTJQIn048ePMx4wDMP4dhRFmqYNPOHJkycznmLxaORr+FrmCqEEVprruvEHOOfc87x0h40yIUlSs9k0TXMRAzefoIMDq4vebPG0i1qtZppm8gm+7yf7UJIkJWsQyYwmbyglWWTFBN/BgXxptVqu6zLGaE4E5QhUwtB1ncZQqNSqadrr168Nw5iwLuD7fnJeSRiGmqYN7GuaJg3W0PM5561WK85TgiDQNI3epcOPZit5dRT+wjBUFMXzvMVMxkMoARhEb8WRc+RoUul9j64yhBIASAFqJQCQAoQSAEgBQgkApAChBGBQ1O/75+dRv591Q6aRVeMRSgAGBb1e9fg46PWybsg0smo8QgkApAChBABSgFACAClAKAGAFCCUAEAKEEoAIAUIJQCQAoQSAEgBQgkApAChBABSgFACAClAKAGAFCCUAEAKEEoAIAUIJQCQAoQSAEgBQgkApAChBGCQtL6ulsvS+nrWDZlGVo3HOjgAkAJkJQCQAoQSAEgBQgkApAChBKCQfN8XQmTdij9BKAEoJEmSFEXJuhV/ghEcAEjBd7JuAEA2dF0Pw1CSpDAMOefVatVxnCiKPM9TFCUIAtM0GWPVapVumKYZBAFjzPf9qU9KB6nVat1u1/M8znkYhrZta5qWfI7v+9QwIYRt25IkxY/att1ut+O7tVpN1/Xx+46/0klO6nmeaZqc8yiKFEWRJMm27cFruwNYVYwxx3Hu7u7q9bqqqnd3d91u9+joiB5tt9v1ej35fFVVu93ujCc1DKNUKtF56YyVSsV13fgJ7XY7vm1ZlmVZ9x2q3W4bhjGwZeS+46/0wZPKshxfuOu6dIQByEpgpdFHuiRJzWaTMUYfvPSQEMKyLPokZ4z5vs85p9sz0jQtTiU4557n6boeJyZCiCAIqBmKoiRzkAeN2XfMlU5yUqrycs41TRv5IiCUANyr2WxaltVqtRhj8Y3ZJfsO7M/f1UEQ0HuVuh6PGqaZet8Hd/R933GcdrsdRVEURc1mc7jii1ACcC9KTKg4klZKwhijmkuM8h26ret6soTh+/7kWcnU+47fkcJcXByhYooQYiAgYjAYYBxKTBzHGVFonFYQBLqu01vU931d16nTwRiLoigOK1EUOY4z+WGn3nf8jnT58V2KIMnOEUEogRVFtYO4ZhEP0MRjN4RSfc75wIfwjKfe3d3VNE0I4bqu7/txRkB9B9M0dV2n2gRVUiY57H37Pnil40/6+eefUwvjJzSbzeEEDfNKAB6gaVqr1UorlNC7d0yOE0VREASTzEDzPK/T6SQPNfm+jz0pPYF9iq3DUCsBGIdmf6SYkjxIkqRJyqVRFLmu22g0pth3ipM+fOT7hqwBVlm32y2VSvHbZGD6xtQMw4iPmZxLMsURZFkeM+Vk8dDBAYAUoOwKAClAKAGAFCCUAEAKEEoAIAX/H7fHBn9BH9P+AAAAAElFTkSuQmCC",
      "text/plain": [
       "Tree('S', [Tree('NP', ['I']), Tree('VP', [Tree('V', ['shot']), Tree('NP', [Tree('Det', ['an']), Tree('N', ['elephant']), Tree('PP', [Tree('P', ['in']), Tree('NP', [Tree('Det', ['my']), Tree('N', ['pajamas'])])])])])])"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import IPython  # for displaying parse trees inline\n",
    "\n",
    "# Visualize parse trees\n",
    "for tree in parser.parse(sentence):\n",
    "    IPython.display.display(tree)  # instead of tree.draw()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAnkAAABlCAIAAAC+4GI/AAAJMmlDQ1BkZWZhdWx0X3JnYi5pY2MAAEiJlZVnUJNZF8fv8zzphUASQodQQ5EqJYCUEFoo0quoQOidUEVsiLgCK4qINEWQRQEXXJUia0UUC4uCAhZ0gywCyrpxFVFBWXDfGZ33HT+8/5l7z2/+c+bec8/5cAEgiINlwct7YlK6wNvJjhkYFMwE3yiMn5bC8fR0A9/VuxEArcR7ut/P+a4IEZFp/OW4uLxy+SmCdACg7GXWzEpPWeGjy0wPj//CZ1dYsFzgMt9Y4eh/eexLzr8s+pLj681dfhUKABwp+hsO/4b/c++KVDiC9NioyGymT3JUelaYIJKZttIJHpfL9BQkR8UmRH5T8P+V/B2lR2anr0RucsomQWx0TDrzfw41MjA0BF9n8cbrS48hRv9/z2dFX73kegDYcwAg+7564ZUAdO4CQPrRV09tua+UfAA67vAzBJn/eqiVDQ0IgALoQAYoAlWgCXSBETADlsAWOAAX4AF8QRDYAPggBiQCAcgCuWAHKABFYB84CKpALWgATaAVnAad4Dy4Aq6D2+AuGAaPgRBMgpdABN6BBQiCsBAZokEykBKkDulARhAbsoYcIDfIGwqCQqFoKAnKgHKhnVARVApVQXVQE/QLdA66At2EBqGH0Dg0A/0NfYQRmATTYQVYA9aH2TAHdoV94fVwNJwK58D58F64Aq6HT8Id8BX4NjwMC+GX8BwCECLCQJQRXYSNcBEPJBiJQgTIVqQQKUfqkVakG+lD7iFCZBb5gMKgaCgmShdliXJG+aH4qFTUVlQxqgp1AtWB6kXdQ42jRKjPaDJaHq2DtkDz0IHoaHQWugBdjm5Et6OvoYfRk+h3GAyGgWFhzDDOmCBMHGYzphhzGNOGuYwZxExg5rBYrAxWB2uF9cCGYdOxBdhK7EnsJewQdhL7HkfEKeGMcI64YFwSLg9XjmvGXcQN4aZwC3hxvDreAu+Bj8BvwpfgG/Dd+Dv4SfwCQYLAIlgRfAlxhB2ECkIr4RphjPCGSCSqEM2JXsRY4nZiBfEU8QZxnPiBRCVpk7ikEFIGaS/pOOky6SHpDZlM1iDbkoPJ6eS95CbyVfJT8nsxmpieGE8sQmybWLVYh9iQ2CsKnqJO4VA2UHIo5ZQzlDuUWXG8uIY4VzxMfKt4tfg58VHxOQmahKGEh0SiRLFEs8RNiWkqlqpBdaBGUPOpx6hXqRM0hKZK49L4tJ20Bto12iQdQ2fRefQ4ehH9Z/oAXSRJlTSW9JfMlqyWvCApZCAMDQaPkcAoYZxmjDA+SilIcaQipfZItUoNSc1Ly0nbSkdKF0q3SQ9Lf5RhyjjIxMvsl+mUeSKLktWW9ZLNkj0ie012Vo4uZynHlyuUOy33SB6W15b3lt8sf0y+X35OQVHBSSFFoVLhqsKsIkPRVjFOsUzxouKMEk3JWilWqUzpktILpiSTw0xgVjB7mSJleWVn5QzlOuUB5QUVloqfSp5Km8oTVYIqWzVKtUy1R1WkpqTmrpar1qL2SB2vzlaPUT+k3qc+r8HSCNDYrdGpMc2SZvFYOawW1pgmWdNGM1WzXvO+FkaLrRWvdVjrrjasbaIdo12tfUcH1jHVidU5rDO4Cr3KfFXSqvpVo7okXY5upm6L7rgeQ89NL0+vU++Vvpp+sP5+/T79zwYmBgkGDQaPDamGLoZ5ht2GfxtpG/GNqo3uryavdly9bXXX6tfGOsaRxkeMH5jQTNxNdpv0mHwyNTMVmLaazpipmYWa1ZiNsulsT3Yx+4Y52tzOfJv5efMPFqYW6RanLf6y1LWMt2y2nF7DWhO5pmHNhJWKVZhVnZXQmmkdan3UWmijbBNmU2/zzFbVNsK20XaKo8WJ45zkvLIzsBPYtdvNcy24W7iX7RF7J/tC+wEHqoOfQ5XDU0cVx2jHFkeRk4nTZqfLzmhnV+f9zqM8BR6f18QTuZi5bHHpdSW5+rhWuT5z03YTuHW7w+4u7gfcx9aqr01a2+kBPHgeBzyeeLI8Uz1/9cJ4eXpVez33NvTO9e7zofls9Gn2eedr51vi+9hP0y/Dr8ef4h/i3+Q/H2AfUBogDNQP3BJ4O0g2KDaoKxgb7B/cGDy3zmHdwXWTISYhBSEj61nrs9ff3CC7IWHDhY2UjWEbz4SiQwNCm0MXwzzC6sPmwnnhNeEiPpd/iP8ywjaiLGIm0iqyNHIqyiqqNGo62ir6QPRMjE1MecxsLDe2KvZ1nHNcbdx8vEf88filhICEtkRcYmjiuSRqUnxSb7JicnbyYIpOSkGKMNUi9WCqSOAqaEyD0tandaXTlz/F/gzNjF0Z45nWmdWZ77P8s85kS2QnZfdv0t60Z9NUjmPOT5tRm/mbe3KVc3fkjm/hbKnbCm0N39qzTXVb/rbJ7U7bT+wg7Ijf8VueQV5p3tudATu78xXyt+dP7HLa1VIgViAoGN1tubv2B9QPsT8M7Fm9p3LP58KIwltFBkXlRYvF/OJbPxr+WPHj0t6ovQMlpiVH9mH2Je0b2W+z/0SpRGlO6cQB9wMdZcyywrK3BzcevFluXF57iHAo45Cwwq2iq1Ktcl/lYlVM1XC1XXVbjXzNnpr5wxGHh47YHmmtVagtqv14NPbogzqnuo56jfryY5hjmceeN/g39P3E/qmpUbaxqPHT8aTjwhPeJ3qbzJqamuWbS1rgloyWmZMhJ+/+bP9zV6tua10bo63oFDiVcerFL6G/jJx2Pd1zhn2m9az62Zp2WnthB9SxqUPUGdMp7ArqGjzncq6n27K7/Ve9X4+fVz5ffUHyQslFwsX8i0uXci7NXU65PHsl+spEz8aex1cDr97v9eoduOZ67cZ1x+tX+zh9l25Y3Th/0+LmuVvsW523TW939Jv0t/9m8lv7gOlAxx2zO113ze92D64ZvDhkM3Tlnv296/d5928Prx0eHPEbeTAaMip8EPFg+mHCw9ePMh8tPN4+hh4rfCL+pPyp/NP637V+bxOaCi+M24/3P/N59niCP/Hyj7Q/Fifzn5Ofl08pTTVNG02fn3Gcufti3YvJlykvF2YL/pT4s+aV5quzf9n+1S8KFE2+Frxe+rv4jcyb42+N3/bMec49fZf4bmG+8L3M+xMf2B/6PgZ8nFrIWsQuVnzS+tT92fXz2FLi0tI/QiyQvpTNDAsAAAAJcEhZcwAADdcAAA3XAUIom3gAAAAddEVYdFNvZnR3YXJlAEdQTCBHaG9zdHNjcmlwdCA5LjIzKPqaOAAAGehJREFUeJztnc9v48iVx8venkXb7ukWe8fuoHOwyUY2gL1AANHuUwA3IPowPVdRe5zJQdI/kIi65Uolc8klADWHoOe0EANsDjszB3EA+5BLW5xNsOu+BOLYQbLZsTOiG2g7PzqB9vDGtTWURFESKVH293OiWKxi/XhVr+q9Kmqh2+0yAAAAACTG4qwzAAAAAFxzoGsBAACAZIGuBQAAAJIFuhYAAABIFuhaAAAAIFluzToDAIC043me53mMMU3TZp0XAOYSrGsBAGHU63VN05rNZrPZVBRFUZRZ5wiA+WMB52sBAIPwPE/TNNd1JUlijPm+f//+fQwaAIwK1rUAgIH4vq8oCilaxpgkSZZlzTZLAMwjWNcCAMJQVVXTtEKhoKrqrPMCwLyCdS0AIAzXdR89emRZlqZpqqratj3rHAEwf2BdCwCICrlvHcfBDikARgLrWgDAQOr1uuu6/KeiKLqu0/kfAEB0oGsBAANpt9viZijf923bxqIWgFHBtywAAEOg7VG+7zuOU61WoWsBGBX4awEAQ/B9nyzJ+G4UAOMBXQsAAAAkC/y1AAAAQLJA1wIAAADJgr1RAICv8E5PvbMzunaPjz87Ofnz69f/+/IlY+z4j3/88uJicWHh3tLSP925s/HWW/eWlv75G9/Y/fa3GWPSyoq6sTHDnAOQcuCvBeDa4h4f+xcXdO2dnbVPT+nav7z0zs46r14xxn7b6Xz56lWML31w9+7a3bu3Fhfv37mjrq/TzW1ZlpaXGbQyuKlA1wIwNzhHR/zaPTnpXOlI7+zMv7z86v7x8fnV9UjcW1paffPNN2/f/taDB4yxh5nMO9/5zke//vW/f/bZyZdf5jY36+++q6yteaen7snJf/zqV//1u9/9529/S3EXFxbeuHWr2+3+9W9/i/7G7Pq6tLLCGJOWl5XVVboJrQyuJdC1AMwG/+LCPT7+6vrysvX55zzIPTnh15++eDE0qXtLS5nl5dd///ufXr9eeuON/zk/7/tYbnOTCYqNtNogleYeHxu2/emLF/Lqak3X9Z2dvmm6x8fOixft01OyOdPNb9y7960HDxYXFh7cvbvx1ls0LeCKWeTW4uKbS0t/+utf//z6dUgBoZXBvANdC0A8iM5O0WDLBN3pX1x8JujRQXDV0nn16lsPHpxfXtJS9U+vX99aXHz1l7/85osvemNllpdJ5XDL7d7WFmNMWV1V1taiF8S/uDA/+uhHn3ySWV4u7e5W33mHMhMlont83Dw68s7O3JOTz69qI7u+rm5sbG9sqBsb0vIy1VLz6IhdrcgHVcuDu3dvv/HGveXll5eXjLH7Kyt/ePmS/MeDgFYG6QS6FoD+iM5O0WBLzk669s7OuEYJgRaUBFeEjLF/+eY3//DyJWPsN198cX9lhV1p5UF2YHl1lfQHJfJobe2rnxsbEdXhUGoff2x+9NH55WV+e7um6yMp6QBkbW59/rl7ciKuznObm+r6+rYsq+vrgfSdKwVMM5WQ2sgsL8tvvcUYo7nIwsLCvaUlxtiXgn86vGmglcE0ga4FNwXR2dkUrkd1dvLlI/v6MC1uBeIjNSlsbiLmenqQZXgkM2+8OEdHhm1/dnIir67W331X29qKN333+Ng9Pm4dH4vW5szysra5qayu7m1thc8YuMmd2i68JmlSwqvxH2/devP27Tu3b99aXOT2Bm5sgFYGUwC6FswlMTo7+UjKGKMBmq75qpGFmmH7LsUGGUXjMvPGi3d6an788QcHB5nl5eo77xhPn07hpc7REV/1BqzNj9bW1PX1kZR9YE5DrTBIidKEhtqaz5BI0/MJmegFgFYGkwNdC1LBIGenaLCN6OwcZLDlQx4bxejKM8bNyDM388aL0WjUDw7OLy+Lu7s1XZ9JJv2LC+fFC9K7YsWStfnR2pq2uTnJXIQ0aKAF+87DApMhar6+OhJaGYwEdC1IhEHOTtFgG8XZKRps2dd1556w7pnE4DlHZt54cY6OSs+efX52ltvcrD59GrvReGzI2tw+PRUdvSQJ6vr6UGvzSPC5VJS9WqQgufEjok0CWhkw6FoQkUHOzlENtnzZxyI4O+PiGph548U7PS09e/bpixeZ5eWarpeePJl1jsIga3P79NR58YKrJXl1lW+wSm6WMNJerd4FMRtxFgitfI2Brr2JRHF2jno6hX190RnR2RkL197MGyP8PA9jrPL229HP86QEfqwoYG3Orq9rm5uP1tbUjY3paJRJ9mpNbguBVp47oGuvA1GcnROeThnP2RkXN9bMGy/1/X3z44/JaEwfgZp1jial77Eibm3ue6xoOsS1V2vynEArpwTo2jQS16f4Bp1OYfE5O+MCZt5EifgRqHmn70esyNpMx4rSIOosmb1aceWKQSsnA3Rt4ogGWxafs1M8nZKoszMWYOadFWN/BGreifIRqzT3lOT2ak0OtPIYQNeOjGiwTcjZKRps52WtBjNvConxI1DzzhgfsUohU96rFVeGGbQydC0b9r9j/P4kzk7RYDu/KzOYeeeIpD8CNe9M+BGrFDLbvVqTc+218jXUtUk7O0WDLUuHszMWYOa9HszkI1DzTrwfsUoh6dmrNTlzqpXTrmtj/BTfIGeneDolDdOf5ICZ99qTho9AzTtJf8QqhaRzr9bkpEorp1fXOkdHe++/H/JAFGdneuZiM2fhe98L3IGZ95phHx4WfvrTtH0Eat7p+xGr1g9/mE7tEjvR92p1f/azGeQvJsbWypW3364VClFekV5d652eWvv7LH2nU+YUo9GAmffaYx8eXtfzPCmBrM2wzBPiHo6IKmfeCWjl6AfJ0qtrAQAAgOvB4qwzAAAAAFxzoGsBAACAZLkVSyqO4/BrRVEURRkUylFVVZKkoXEZY57neZ7HGNM0rTcd3/dd1w2E0k0xNXqLJEmqqo5WtjhwHMc0zb71EIKmadVqtW+pY6FUKnmeV6vVwuuE1z9vssAdsVyBGhaDkivIdOgraUOlV4xID/i+T9UoRg/EHSrAAVkKqfaQ0MBL+8qnGDc8WTYnnZETe/8aVD+prYHo9JXhSYb9m0h3YlqtVi6Xy2QyuStkWW61WoNC6Wez2Rwat9vtWpYly3KlUqlUKrIsy7IsvtqyrGw2y0N5RPqZyWQ6nU63222325RyNpttt9uTF3lUWq1WpVIZNValUhGrIgkqlQo1RAj5fJ4xlsvlGo0Gj5XNZjOZDOWwtwUty+p2u61Wi5qM7mez2WKxSC0yd/SVtKHSG4hIlcAlodFo0J1MJlMsFsVY4QJMP/mdXC6Xz+fF6GKW6AEeFPJSamUxWTHu0JfOS2fkxNu/QuontTUQkb4yPMmwfzOJQdcSYn+mQXZQKP0UK31Q3Ha7LcsyH507nY44OWg0GmJv73Q6ouySfIsaLopeuWlErJNisRh4zDRNUqiE2IKdToe3GvVPHmRZVkAS5oJwSQuRfFJs4vTCNM3ArItqUpRzIooAD2o+MUviJCn8pbxz8WSbzWZve/V9KTojp2/R5rcGwmV4kmH/ppGIv1ZVVUVR+toQyBCxt7fX19oWiOv7vqIo3OYgSZJlWfxJ0zTr9Tr/KUlSrVYzTZPf0XXddd1RLbcxUqvVNAExt4wxx3E0TaPy6rpOVlmiVCpRFKouEU3TDMMgw2/fZ2zbVhRF0zRFUUqlku/7fUNVVa3VahELUigUGo2GeKfRaOi63vdhSZIURRGLI5ZLUZTeQqWcoZLGCUg+RRSNZqVSqVwui1GoJsvlckA8WBwCTHLS21J9X9putwOP9QrtINLfGUUG9a+hnWsSUlUD0Ykiw8Qkw/5NIB5/bQDf9z3PE10Rvu9TG5Db0jCMQXHr9brv++REUVXV933DMAqFAqVWKpXoMdd1JUkKmP41TeMP8NRIxOMr3AgYhsFL6jhOs9nkQa7rGoZBmo9d6V0qFGWbogc0JT25sLBQqVSoULZtm6Zp2zaF1uv1VqvF07FtW9d1Lv2B0FKp5DjO3t7e0IJQxXJHI+VWrHzevoyxZrMZ4oja29trNBrz4qZikSWNEKWXXU07xAcCSbmuSx4sGv17+8V4Akxt4bpuo9HojTvopX3HwSiD41x0xkBOWL/+Fd65YnlvSmogOkNlmDPJsH8TiE3Xep7Hq9K27VqtJraH53l91wGE67o0PNGFODNyXbder1uW5Xme7/vVapUm6b7v9x2vA2JBa7harZa2ZjZNs1ar8dzSNo16vR4ln9lsli9JdV0XRzTTNLkqpdBWq1Wv1+mZQGi9Xo+u82gNRNmzLCuwJhbbl5SNuANIZO52RgyVtBDpFXFdl4/sXBlblkVzHRrRSAsG3jKeAPu+XyqV+q5Kh750VOa9M4qEdK7JmYsaCKFXhicZ9m8aselaSZL4CqnXOKmqKo1BvGHEsZiH1mq1gK2SMVYqlUjiPc/jRtdBpsjem7TVcJDBc1Y4jhOYLyuKYppmlE4YUFfn5+f82vf93pI+evSIv6J38RExw7qu0xqIJj2BYZS3IEFaue9A39e2nGaGSlq49HJM0/SvcByHKtC2bc/zyDNCF72VNp4AG4ZRLpdJBdq2Ldohorx0JOa9M4qEdK5YSH8NhNArw5MM+zeNOHVtlIGb2oMMqr1tYxgGmbZID9HCi0+ZuV+TdC2dPBEHfTKZ9masWq0ahpEqV4GqqoHMD1ocjISiKLZtD5LmXj0X0YZMKauqSmbJwrCPsZVKJdGzLtJoNKL7idNAdEkLSC9jjEYlbs9nV5JPSZG9QawNcpoEmm8MASZTNl+TWZbFu2fEl47EvHfGaTJ3NRAuw7EM+zeE1H3LgvyCNBtqt9vikO37PndwMsZqtZqu69ymQR7QarXamyb1+VS5ScrlsriE9X3fsqy+Ow5GolAoBFbGjuPwFaeu62JovV4faZVZKBQsy3IcZ6hhbZB1moaYOXLWEtElTZRexhi5PHqd7kSz2Qy0uKZpfb2DIwlw+FI1+ktHYq4745SZrxoIl+HrDXnc6DxxDHEn38pMBwP4UarAAYNAKD/zR7vGxVAesd1uZzIZ2lleLBbp4WKxyA9ucvjBL3qMn+5qNpv8ZKeY5gx3nDebzcBhj0DmefHpYIx4UC+Xy1HR6GReJpPh5yvo8GvgLAE/DJfNZvP5vLhfn4fm8/lisUg/A7UaQuDcQrdf+8qyTOdoxVag+2McMk4JfSUtXHrFiPl8nuo8m82SEPJTy7zyKWWqpXABpsOLAQnhWaXjJX3PNYa/tNvtmqYpJiv25fCXDqqibvo646D+FaVzhRBSP2mrgVHpK8OTDPvzAu1mzWQyYxyD7o07B/890PdjNCKO46T/Cyy2bbdarYD9ZGjRxoaWVn2/0sLNffNixUoPY0sabSpJv5ROzlx0RjAGN0eGRfjydPK4c6BrrwHkPCuXy/P+nUIAAABjkDp/7XXCMIyFhYWFhQVVVbe3t6FoAQDgZoJ1LQAAAJAsWNcCAAAAyZLINxpjwTk68i8v9Z2dWWcEgLTjHh83nj//t+fPFxj718ePy0+eKGtrs84UuEFguB5Kem3I2o9/zBhzfvCDWWcEgJTiX1zUDw4az59/dnLCGNt6+HBhYeG/f/97xlh2fb385Im+syOtrMw6m+D6g+F6KOld1wIABlHf328eHf281WKMyaurlbff5mtZ7/TU2t+3W63ys2flZ8/y29uFx4+x4ABgtkDXAjA3OEdHjcND+/Dw/PIys7xc3N0tP3mibmyIzyhra7VCoVYo8Id/3mpllpdLu7uFx48DDwMApgN0LQBphy9VPz87Y4zlt7f3trZKT56Ex9K2trStrZqu24eHzaOjH33yyY8++SS7vl54/Fjf3oZDF4BpAl0LQErxLy7sw0Nrf5/csdn19bKul3Z3R3LBSisrpSdPSk+eeKendqtl7e9Xbbtq2xEVNgAgFqBrAUgd9uFh4/nzvu7YsVHW1oynT42nT93jY2t/n2zLhm3rOzu9hmgAQLxA1wKQFrgW5O7Yws6OtrUV71vUjY36e+/V33uPNPoHBwcfHBzIq6v69jYOCwGQEOnVter6ev3gYNa5ACBxuHVXdMdO4biOvrOj7+zwg0Pk0M1tbhZ2dnBYCIwEhuuhpFfXMsbOLy9nnQUAkoLcsY3Dw09fvGBX7tjp71qSVla4bbnx/DkdFiLbMqn8aWYGzC8YrsNJta4F4FpCG4M/ODhgV+7YNJzGUTc21I2NWqHAswfbMgBxAV0LwJSghWP94IBWAMXd3XQuHMm2TIeFrP19flgIH6ICYGygawFIloA7dl4couJhIXyICoAJga4FIBHIHSt+SdGchTt2cvAhKgAmB7oWgJghFcuP7qTEHTs5+BAVAGOTXl17/84dxph/cZFyUxsABN/Hy4/uXEtbKz5EBXrBcD2U9P6nnnN0tPf++83vfz/2s/wAxEjgj+1u4B6iwCc48CGqGwiG66Gkd10LQMoJ+WO7GwU+RAXAUKBrARiN3i8pYhlHhHyICrZlcMNJr66VVlaKu7vK6uqsMwLA13CPjz84OIB7chC9H6Ky9vdRUdcbDNdDSa+/FoB04l9cMMZujjt2crzTU1iSwQ0HuhYAAABIlsVZZwAAAAC45kDXAgAAAMnyD9/97nc9z/M8jzEmSdKs8/P/aJr28OFDRVEmT8r3/V/+8pdMKKDjOJ7n+b7/8OHD8FD+M521dF3p2wT8Z4BSqfSTn/xka2urb+gYxCh74cxKMsWIIRU7HdA92VUleJ4nSdLS0pLv+0tLS0NjkeR/+OGHIcIfLsyO4xiGUa/XP/zww6Ojo729vYmKEY2UjO3TlpxcLpfNZjOZTLFY7I5ILpcbNUp0KpVKq9WKKylZljOZTKfT6Xa77XY7l8vJspzNZtvtdnhoq9XK5XKZTCZ3hSzLcWUM9KXRaMiyLMtyPp/vXrVIJpOhFukbpVKpNJvNuDIwSPZiF/iZSCalw5PK5XJUz7MC3dOyLFmWK5UKFTafz1cqlejRw4U/ZCBtt9u8T1Fljprz8UjD2P6LX/xiypLDut2uaZrU2JSh6CSqa+OFKl2UYFFAw0O7Xy9pq9WSZTn5LN9oaNwR7+RyufABJUZdO4gkBH6GkjmdSovCTe6ejUYjMNeJV9eG0Gw2R3pROplEeKYpOYuMsUajoet6uVyu1+viklfTNMMwarWaqqqapmma5rouX4nTT03A930xOj2jqqqiKLqu0zo9PFmiVCr1vR+e7FB0XXdd13GcMUJF6NVRngTJYdu2oigkCbVaLRBqGAZJl6IopVIpIJmu6+q6rl5hGIZt2xQ0SPbCBZ7f5zkxDIPuRClLqiRzaPcMqb1JSFUlTBMy4Yp3qtXq9vY2Dw2R5HBCBlLP86ihbdvm8sxDQ8ZYLuqu61IfVBSF53+SsX2SksYiPIlLTqvVIutxp9PJZrMBVcwY4zOC3vlXyDS/1WqJFr9msymum8OTJfpO1sKTDYHWSWQz6U0/PDRQ0k6nM4YNAIxE+LrWsqxisciboFgsyrIstpd4bZqmaZr8Z7vdFo1F1OiBdw1aKIQIfLPZDHhhcrncIIt34F2zksxBxQzpnlFqbwxucvcMN5aESDJnqA15UGjfdW2UMZY8j9QHO52OmKuxx/YoJe3LJMIzTclZtCyL/OGSJCmKEphuZLNZPlXXdT26zjdNs1arce+3pmnVapVPfxJKdii0EupdBg0N9TzPuIIWUnO6BeN6QGLAm6BerweagybOjuM4jqOqaqfT4UGWZVWrVVVV6aeiKNVqdfItIZqm8U0WjDHHcWjKHzF6qiQzpHsmVHs8tfRUQkoIkeSEiDjGappG/U6SJMMw+P2xx/YJSzqe8ExTcm7Ztu15nmVZ9GLLssRqDbz4/Pw8YrqO4wQsS4qimKZJrZJQslGoVquapum6PlKoJEl8QBnUnGBqKIrSq1z5NRk5FUUhlUAGMTE0oBsGCcOoVKtV0zSp+/CLkaKnRDJDumdytUekpxKmRoixNFySEyLiGDtogjXe2B5LSccQnmlKzq1SqSS+Q1VV3/cn1+2qqnqeJ87rfd/n0+EZJitJUrVaNQyj75pjUKgkSVOQcsC5f/9+78SWi2Wvk95xHN5nSqWSbdtcKhzHaTab/ElFUUZyBUVH0zTTNGkiP9KilpgLyUyu9oi5qIR4kSTJdd3AIEaDcLgkJ0RCQ3c4sZR0DOGZpuQslstl8bemadF3OpCU8J/iCFgul8V5kO/7lmUF3jUGsSRLU5u+G6+GhoLpQNYncfOR2OF1XRfFoF6vi7Ln+z7vTiQhYsqFQsE0TVFh0BHDKLkKEXiClraWZY03R06/ZEapPdd1ySE1nlZOfyXES61WMwxDlKVarWaaJhsmyQmR0NAdTlwlTbXw5HI5y7LIeVssFrPZLO2Q5ocauXM7n88zwe/dvdokTc7nbDabz+dFx7JlWbRvgpJtNBpd4azkoGQtywoc/svlcuKxp77JhkPufVmWuSe83W5nMhnykA8NFY9hRXkdiAU6ZcsbOnD0jUSuUqnk8/lisUg/SZL5aUWKSJv+xY1L9ACPxbdZDZW9cIEncrlc9L1CM5FMOpMaKCZPf2ivH1R7YqEYY5lMJsrWsFlVQqqg7Ui8SsXBMESSw8U1PJRkgEZ7Cgo08aAxtlKp8MPugTqfZGwf2mcHMbbwTF9yJv3vAd/3aZqgqmqv5ZmHxrtOTyhZkEIcx5Ekqa8Ji/Yi9bXWkoQMishTZgPkNoRwgWeM6breu1fr+hFee3xpO/V8zTHceCNWaRRJToLpj7GzKunUwP/8ABAbtm23Wq25254DAEia9P5XPADzgud5qqqKWy6hbgEAIljXAgAAAMmC/9QDAAAAkgW6FgAAAEgW6FoAAAAgWaBrAQAAgGT5P/H0aeW/EvDNAAAAAElFTkSuQmCC",
      "text/plain": [
       "Tree('S', [Tree('PERSON', [('Antonio', 'NNP')]), ('joined', 'VBD'), Tree('ORGANIZATION', [('Udacity', 'NNP'), ('Inc.', 'NNP')]), ('in', 'IN'), Tree('GPE', [('California', 'NNP')]), ('.', '.')])"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from nltk import pos_tag, ne_chunk\n",
    "from nltk.tokenize import word_tokenize\n",
    "\n",
    "# Recognize named entities in a tagged sentence\n",
    "tree = ne_chunk(pos_tag(word_tokenize(\"Antonio joined Udacity Inc. in California.\")))\n",
    "IPython.display.display(tree)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Stemming & Lemmatization\n",
    "\n",
    "### Stemming"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['first', 'time', 'see', 'second', 'renaiss', 'may', 'look', 'bore', 'look', 'least', 'twice', 'definit', 'watch', 'part', '2', 'chang', 'view', 'matrix', 'human', 'peopl', 'one', 'start', 'war', 'ai', 'bad', 'thing']\n"
     ]
    }
   ],
   "source": [
    "from nltk.stem.porter import PorterStemmer\n",
    "\n",
    "# Reduce words to their stems\n",
    "stemmed = [PorterStemmer().stem(w) for w in words]\n",
    "print(stemmed)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lemmatization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['first', 'time', 'see', 'second', 'renaissance', 'may', 'look', 'boring', 'look', 'least', 'twice', 'definitely', 'watch', 'part', '2', 'change', 'view', 'matrix', 'human', 'people', 'one', 'started', 'war', 'ai', 'bad', 'thing']\n"
     ]
    }
   ],
   "source": [
    "from nltk.stem.wordnet import WordNetLemmatizer\n",
    "\n",
    "# Reduce words to their root form\n",
    "\n",
    "lemmed = [WordNetLemmatizer().lemmatize(w) for w in words]\n",
    "print(lemmed)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['first', 'time', 'see', 'second', 'renaissance', 'may', 'look', 'bore', 'look', 'least', 'twice', 'definitely', 'watch', 'part', '2', 'change', 'view', 'matrix', 'human', 'people', 'one', 'start', 'war', 'ai', 'bad', 'thing']\n"
     ]
    }
   ],
   "source": [
    "# Lemmatize verbs by specifying pos\n",
    "\n",
    "lemmed = [WordNetLemmatizer().lemmatize(w, pos='v') for w in lemmed]\n",
    "print(lemmed)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
